% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/external.bold.fetch.R
\name{bold.fetch}
\alias{bold.fetch}
\title{Retrieve data from the BOLD database}
\usage{
bold.fetch(
  get_by,
  identifiers,
  cols = NULL,
  export = NULL,
  na.rm = FALSE,
  filt_taxonomy = NULL,
  filt_geography = NULL,
  filt_latitude = NULL,
  filt_longitude = NULL,
  filt_shapefile = NULL,
  filt_institutes = NULL,
  filt_identified.by = NULL,
  filt_seq_source = NULL,
  filt_marker = NULL,
  filt_collection_period = NULL,
  filt_basecount = NULL,
  filt_altitude = NULL,
  filt_depth = NULL
)
}
\arguments{
\item{get_by}{A character string specifying the parameter used to fetch data (“processid”, “sampleid”, "bin_uris", "dataset_codes" or "project_codes")}

\item{identifiers}{A vector (or a data frame column) pointing to the \code{get_by} parameter specified.}

\item{cols}{A single or multiple character vector specifying columns needed in the final dataframe. Default value is NULL.}

\item{export}{A character string specifying the data path where the file should be exported locally along with the name of the file with extension (csv or tsv). Default value is NULL.}

\item{na.rm}{A logical value specifying whether NA values should be removed from the BCDM dataframe. Default value is FALSE.}

\item{filt_taxonomy}{A single or multiple character vector of taxonomic names at any hierarchical level. Default value is NULL.}

\item{filt_geography}{A single or multiple character vector specifying any of the country/province/state/region/sector/site names/codes. Default value is NULL.}

\item{filt_latitude}{A single or a vector of two numbers specifying the latitudinal range in decimal degrees. Values should be separated by a comma. Default value is NULL.}

\item{filt_longitude}{A single or a vector of two numbers specifying the longitudinal range in decimal degrees. Values should be separated by a comma. Default value is NULL.}

\item{filt_shapefile}{A file path pointing to a shapefile. Default value is NULL.}

\item{filt_institutes}{A single or multiple character vector specifying names of institutes. Default value is NULL.}

\item{filt_identified.by}{A single or multiple character vector specifying names of people responsible for identifying the organism. Default value is NULL.}

\item{filt_seq_source}{A single or multiple character vector specifying the data portals from where the (sequence) data was mined. Default value is NULL.}

\item{filt_marker}{A single or multiple character vector specifying gene names. Default value is NULL.}

\item{filt_collection_period}{A single or a vector of two date values specifying the collection period range (start, end). Values should be separated by a comma. Default value is NULL.}

\item{filt_basecount}{A single or a vector of two numbers specifying range of number of basepairs. Val- ues should be separated by a comma. Default value is NULL.}

\item{filt_altitude}{A single or a vector of two numbers specifying the altitude range in meters. Values should be separated by a comma. Default value is NULL.}

\item{filt_depth}{A single or a vector of two numbers specifying the depth range. Values should be separated by a comma. Default value is NULL.}
}
\value{
A data frame containing all the information related to the processids/sampleids and the filters applied (if/any).
}
\description{
Retrieves public and private user data based on different parameter (processid, sampleid, dataset or project codes & bin_uris) input.
}
\details{
\code{bold.fetch} retrieves both public as well as private user data, where private data refers to data that the user has permission to access. The data is downloaded in the Barcode Core Data Model (BCDM) format. It supports effective download data in bulk using search parameters like ‘processid’, ‘sampleid’, ‘bin_uris’, ‘dataset_codes’ and 'project_codes' through the \code{get_by} argument. Users must specify only one of the parameters at a time for retrieval. Multi-parameter searches combining fields like ‘processid’+ ‘sampleid’ + ‘bin_uris’ are not supported, regardless of the parameters available. Data input is via the \code{identifier} argument and it can either be a single or multiple character vector containing data for one of the parameters. A dataframe column can be used as an input using the '$' operator (e.g., df$column_name). It is important to correctly match the \code{get_by} and \code{identifiers} arguments to avoid getting any errors. The \code{filt_} or filter parameter arguments provide further data sorting by which a specific user defined data can be obtained. Note that any/all \code{filt_}argument names must be written explicitly to avoid any errors (Ex. \code{filt_institutes} = ’CBG’ instead of just ’CBG’). Using the \code{cols} argument allows users to select specific columns for inclusion in the final data frame. If this argument is left as NULL all columns will be downloaded. Providing a data path for the \code{export} argument will save the data locally. Data path with the name of the output file with the corresponding file extension (csv or tsv) should be provided (Ex. 'C:/Users/xyz/Desktop/fetch_data_output.csv' for Windows). There is a hard limit of 1 million records that can be downloaded in a single instance. Download speeds for very large requests for \code{bin_uris}, \code{dataset_codes} and \code{project_codes} will be throttled, resulting in more time for fetching the data. Download speed would also depend on the user’s internet connection and computer specifications. Downloaded data includes information (wherever available) for the columns given in the \code{field} column of the \code{bold.fields.info()} in the BCDM format. Metadata on the columns fetched in the downloaded data can also be obtained using \code{bold.fields.info()}.

\emph{Important Note}: \code{bold.apikey()} should be run prior to running \code{bold.fetch} to setup the \code{apikey} which is needed for the latter.
}
\examples{
\dontrun{
#Test data with processids
data(test.data)

# Fetch the data using the ids.
#1. api_key must be obtained from BOLD support before using `bold.fetch()` function.
#2. Use the `bold.apikey()` function  to set the apikey in the global env.

bold.apikey('apikey')

# With processids
res <- bold.fetch(get_by = "processid",
                  identifiers = test.data$processid)


# With sampleids
res<-bold.fetch(get_by = "sampleid",
                identifiers = test.data$sampleid)

# With datasets (publicly available dataset provided)
res<-bold.fetch(get_by = "dataset_codes",
                identifiers = "DS-IBOLR24")

## Using filters

# Geography
res <- bold.fetch(get_by = "processid",
                  identifiers = test.data$processid,
                  filt_geography = "Churchill")

# Sequence length
res <- bold.fetch(get_by = "processid",
                  identifiers = test.data$processid,
                  filt_basecount = c(500,600))

# Gene marker & sequence length
res<-bold.fetch(get_by = "processid",
                identifiers = test.data$processid,
                filt_marker = "COI-5P",
                filt_basecount = c(500, 600))
}

}
