
<!-- README.md is generated from README.Rmd. Please edit that file -->

# odyssey

<!-- badges: start -->

[![status-badge](https://ci.codeberg.org/api/badges/15960/status.svg)](https://ci.codeberg.org/repos/15960)

<a href="https://nfrerebeau.r-universe.dev" class="pkgdown-devel"><img
src="https://nfrerebeau.r-universe.dev/badges/odyssey"
alt="r-universe" /></a>

[![Project Status: Active – The project has reached a stable, usable
state and is being actively
developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
<!-- badges: end -->

An interface to the search API of [HAL](https://hal.science/), the
French open archive for scholarly documents from all academic fields.
This package provides programmatic access to the
[API](https://api.archives-ouvertes.fr/docs) and allows to search for
records and download documents.

------------------------------------------------------------------------

To cite odyssey in publications use:

Frerebeau N (2025). *odyssey: Interface to the HAL Open Archive API*.
Université Bordeaux Montaigne, Pessac, France. R package version 1.0.0,
<https://nfrerebeau.codeberg.page/odyssey/>.

## Installation

You can install the released version of **odyssey** from
[CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("odyssey")
```

And the development version from [Codeberg](https://codeberg.org/) with:

``` r
# install.packages("remotes")
remotes::install_git("https://codeberg.org/nfrerebeau/odyssey")
```

## Usage

The use of **odyssey** involves three steps. First, a default query is
created using `hal_query()`. Then, a set of functions allows to
customize this query (see below). Finally, `hal_search()` and
`hal_download()` allow to collect data and to download documents.
`as.data.frame()` can be used to (try to) coerce the results of
`hal_search()` to a `data.frame`.

The following functions allow you to customize a query. They must be
applied to the object returned by `hal_query()` and can be called in any
order. See the HAL search API documentation for a [list of available
fields](https://api.archives-ouvertes.fr/docs/search/?schema=fields#fields).

- `hal_query()` allows to choose the fields to query and to define the
  query terms using boolean logic (`q` parameter).
- `hal_select()` is used to select the fields to be returned in the
  results (`fl` parameter).
- `hal_filter()` is used to [retain all results that satisfy a
  conditions](https://api.archives-ouvertes.fr/docs/search/?#fq) (`fq`
  parameter). `hal_filter()` can be used several times to add multiple
  search filters.
- `hal_sort()` [orders the
  results](https://api.archives-ouvertes.fr/docs/search/?#sort) by the
  value of the select field (`sort` parameter). According to the HAL API
  documentation, you should avoid text fields and multi-valued fields
  which will produce unpredictable results.
- `hal_group()` is used to [group search
  results](https://api.archives-ouvertes.fr/docs/search/?#group)
  (`group.*` parameters).
- `hal_facet()` is used to [facet search
  results](https://api.archives-ouvertes.fr/docs/search/?#facet)
  (`facet.*` parameters).

For a simple search, grouping terms in a `list` allows to combine them
with AND, while grouping terms in a `vector` allows to combine all the
terms with OR. If needed, the infix functions `%AND%`, `%OR%`, `%NOT%`,
`%IN%`, `%TO%` allow to build more complex queries (remember that infix
operators are composed left to right).

``` r
## Load packages
library(odyssey)
```

### Simple search

Get the 10 most recent articles about archaeology of Celts in France:

``` r
## Topic selection
## Will be combined with AND
topic <- list("archéologie", "Celtes", "France")

## Search publications with DOI
resp <- hal_query(topic) |>
  hal_select("doiId_s", "producedDate_tdate") |>
  hal_filter("" %TO% "" %IN% "doiId_s") |>
  hal_sort("producedDate_tdate", decreasing = TRUE) |>
  hal_search(limit = 10)

as.data.frame(resp)
#>                        doiId_s   producedDate_tdate
#> 1     10.4000/books.pcjb.8230. 2021-08-01T00:00:00Z
#> 2      10.4000/books.pcjb.8397 2021-01-01T00:00:00Z
#> 3        10.4000/anabases.9669 2019-10-21T00:00:00Z
#> 4          10.26406/STETR81-08 2019-01-01T00:00:00Z
#> 5   10.4000/books.artehis.3178 2017-10-01T00:00:00Z
#> 6   10.4000/books.artehis.3265 2017-10-01T00:00:00Z
#> 7  10.4000/archeosciences.4457 2015-01-01T00:00:00Z
#> 8      10.3406/galia.2007.3311 2007-01-01T00:00:00Z
#> 9      10.3406/galia.2004.3184 2004-01-01T00:00:00Z
#> 10     10.3406/galia.2003.3143 2003-01-01T00:00:00Z
```

### Faceting

Get the number of documents in archaeology by journal:

``` r
resp <- hal_query("shs.archeo", field = "domainAllCode_s") |>
  hal_facet(field = "journalTitle_s", limit = 10) |>
  hal_search()

as.data.frame(resp)
#> $journalTitle_s
#>                                                                    .value
#> 1        Bulletin de l'Association française pour l'étude de l'âge du Fer
#> 2  Gallia - Fouilles et monuments archéologiques en France métropolitaine
#> 3                          Bulletin de la Société préhistorique française
#> 4                                   Cahier des thèmes transversaux ArScAn
#> 5                                                   Archéologie médiévale
#> 6                                                Quaternary International
#> 7                                          Les Nouvelles de l'archéologie
#> 8                                                             Archéologia
#> 9                              Journal of Archaeological Science: Reports
#> 10                                                 Dossiers d'Archéologie
#>    .counts
#> 1      721
#> 2      670
#> 3      661
#> 4      535
#> 5      519
#> 6      431
#> 7      411
#> 8      410
#> 9      410
#> 10     409
```

### Group Results

Get the first 20 documents in the ARCHEOSCIENCES-BORDEAUX collection,
sorted by publication date in descending order, grouped by document
type:

``` r
resp <- hal_query() |>
  hal_filter("ARCHEOSCIENCES-BORDEAUX" %IN% "collCode_s") |> 
  hal_select("producedDate_tdate", "docid") |>
  hal_group(field = "docType_s", limit = 20) |> 
  hal_sort("producedDate_tdate", decreasing = TRUE) |>
  hal_search()

head(as.data.frame(resp))
#>   .group   docid   producedDate_tdate
#> 1    ART 4196797 2023-10-01T00:00:00Z
#> 2    ART 4960723 2024-05-01T00:00:00Z
#> 3    ART 4109831 2022-03-30T00:00:00Z
#> 4    ART 3712097 2022-08-01T00:00:00Z
#> 5    ART 5355931 2025-11-08T00:00:00Z
#> 6    ART 4656381 2022-05-01T00:00:00Z
```

## Code of Conduct

Please note that the **odyssey** project is released with a [Contributor
Code of
Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.
