| Title: | Handy and Minimalistic Common Data Model Characterization |
| Version: | 0.0.1 |
| Description: | Extracts covariates from Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) domains using an R-only pipeline. Supports configurable temporal windows, domain-specific covariates for drug exposure, drug era (including Anatomical Therapeutic Chemical (ATC) groupings), condition occurrence, condition era, concept sets and cohorts. Methods are based on the Observational Health Data Sciences and Informatics (OHDSI) framework described in Hripcsak et al. (2015) <doi:10.1038/sdata.2015.35> and "The Book of OHDSI" OHDSI (2019, ISBN:978-1-7923-0589-8). |
| License: | Apache License (≥ 2) |
| Depends: | R (≥ 4.1.0) |
| Imports: | checkmate, DatabaseConnector, jsonlite, SqlRender, stats |
| Suggests: | Andromeda, Eunomia, FeatureExtraction, knitr, rmarkdown, testthat (≥ 3.0.0), withr |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-04-22 07:31:59 UTC; AlexanderAleksiayuk |
| Author: | Alexander Alexeyuk [aut, cre] |
| Maintainer: | Alexander Alexeyuk <alexanderAlexeyuk@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-22 13:20:29 UTC |
OdysseusCharacterizationModule: Handy and Minimalistic Common Data Model Characterization
Description
Extracts covariates from Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) domains using an R-only pipeline. Supports configurable temporal windows, domain-specific covariates for drug exposure, drug era (including Anatomical Therapeutic Chemical (ATC) groupings), condition occurrence, condition era, concept sets and cohorts. Methods are based on the Observational Health Data Sciences and Informatics (OHDSI) framework described in Hripcsak et al. (2015) doi:10.1038/sdata.2015.35 and "The Book of OHDSI" OHDSI (2019, ISBN:978-1-7923-0589-8).
Author(s)
Maintainer: Alexander Alexeyuk alexanderAlexeyuk@gmail.com
Convert Input to a Mapping Function
Description
Converts a formula, existing function, or atomic vector into a
function suitable for use in the map, map2, and
pmap families.
Usage
as_mapper(.f)
Arguments
.f |
A function, one-sided formula, or atomic vector.
|
Details
This helper mirrors the behaviour of
purrr::as_mapper for the formula and function
cases, providing a dependency-free alternative for internal use.
Value
A function.
When .f is already a function or an atomic vector it is
returned as-is.
When .f is a formula the returned function accepts
... and evaluates the formula's right-hand side with the
positional bindings described above.
See Also
Build SQL for Multiple Concept-Set Expressions
Description
A convenience wrapper around buildConceptSetQuery that
resolves every element of a named list of concept-set expressions to
SQL in a single call.
Usage
buildConceptSetQueries(
conceptSetList,
conceptSetNames = names(conceptSetList),
vocabularyDatabaseSchema = "@vocabulary_database_schema"
)
Arguments
conceptSetList |
A named list of concept-set expressions.
Each element must conform to the format accepted by
|
conceptSetNames |
Character vector of concept-set labels, one
per element of |
vocabularyDatabaseSchema |
Character string.
The fully qualified schema containing the OMOP vocabulary tables.
Passed through to |
Details
The function iterates over conceptSetList and
conceptSetNames in parallel using map2,
calling buildConceptSetQuery for each pair.
Value
A named list of character strings, each containing the SQL
query for the corresponding concept set.
Names match those of conceptSetList.
Elements whose concept sets resolve to no included concepts are
returned as "" (empty strings).
See Also
buildConceptSetQuery for the single-expression
resolver and full documentation of the expression format.
Examples
csList <- list(
diabetes = list(items = list(
list(concept = list(CONCEPT_ID = 201820),
includeDescendants = TRUE)
)),
hypertension = list(items = list(
list(concept = list(CONCEPT_ID = 316866),
includeDescendants = TRUE)
))
)
queries <- buildConceptSetQueries(
csList,
vocabularyDatabaseSchema = "cdm_v5"
)
# Each element is a SQL string
cat(queries$diabetes)
cat(queries$hypertension)
Build SQL to Resolve a CIRCE Concept-Set Expression
Description
Translates a single CIRCE-format concept-set expression into a
stand-alone SQL query that resolves to the set of
concept_id values implied by the expression.
The function handles the four combination of per-item flags
(includeDescendants, includeMapped) as well as
concept exclusion (isExcluded), and does not require
Java or the CirceR package.
Usage
buildConceptSetQuery(
conceptSetExpression,
conceptSetName = "plug",
vocabularyDatabaseSchema = "@vocabulary_database_schema"
)
Arguments
conceptSetExpression |
A concept-set expression in one of two forms:
See Details for the full item specification. |
conceptSetName |
Character string.
A label embedded in the output SQL as the |
vocabularyDatabaseSchema |
Character string.
The fully qualified schema containing the OMOP vocabulary tables
( |
Details
Item specification
Each element of conceptSetExpression$items must be a list with
the following structure:
conceptRequired list. Must contain
CONCEPT_ID— a single, non-NA, whole-number numeric value.includeDescendantsOptional logical scalar. When
TRUE, all descendant concepts (viaCONCEPT_ANCESTOR) are included. Defaults toFALSEwhen absent.includeMappedOptional logical scalar. When
TRUE, concepts reached through"Maps to"relationships (viaCONCEPT_RELATIONSHIP) are included. Defaults toFALSEwhen absent.isExcludedOptional logical scalar. When
TRUE, the resolved concepts for this item are removed from the final set via aLEFT JOIN … WHERE … IS NULLanti-join pattern. Defaults toFALSEwhen absent.
Resolution logic
Items are first partitioned into included and excluded
groups based on isExcluded.
Within each group, items are further classified into four categories
by their flag combinations:
-
Plain — neither descendants nor mapped.
-
Descendants only —
includeDescendants = TRUE. -
Mapped only —
includeMapped = TRUE. -
Descendants and mapped — both flags
TRUE.
A UNION of the appropriate SQL blocks is built for each group.
If excluded items exist, the excluded set is anti-joined against the
included set.
SQL rendering
The conceptSetName value is injected into the SQL via
render using the @cs_name token.
The vocabularyDatabaseSchema is interpolated directly via
sprintf.
Value
A single character string containing a SQL SELECT
statement that produces two columns: cs_name (the concept-set
label) and concept_id.
Returns "" (an empty string) when items is an empty
list or when all items are excluded and no included items remain.
Input validation
The function performs extensive validation of all inputs and stops with an informative error message if:
-
vocabularyDatabaseSchemais not a single non-empty character string. -
conceptSetExpressionis neither a list nor a valid JSON string. The
itemselement is missing or is not a list.Any item lacks a
conceptelement or a validCONCEPT_ID.Any optional logical flag is present but not a single logical value.
A warning is issued when a logical flag is NA (treated as
FALSE).
See Also
buildConceptSetQueries for batch resolution of multiple
concept sets,
render for SQL parameterisation,
fromJSON for JSON parsing.
Examples
# A concept set with descendants
expr <- list(items = list(
list(concept = list(CONCEPT_ID = 201820),
includeDescendants = TRUE),
list(concept = list(CONCEPT_ID = 433962))
))
sql <- buildConceptSetQuery(
conceptSetExpression = expr,
conceptSetName = "diabetes",
vocabularyDatabaseSchema = "cdm_v5"
)
cat(sql)
# From a JSON string
json <- '{"items":[{"concept":{"CONCEPT_ID":316866},"includeDescendants":true}]}'
sql2 <- buildConceptSetQuery(json, conceptSetName = "hypertension")
cat(sql2)
# Exclusion example
expr_excl <- list(items = list(
list(concept = list(CONCEPT_ID = 201820),
includeDescendants = TRUE),
list(concept = list(CONCEPT_ID = 201254),
isExcluded = TRUE)
))
sql3 <- buildConceptSetQuery(expr_excl, conceptSetName = "diabetes_refined")
cat(sql3)
Materialise Concept-Set Queries into a Temporary Table
Description
Takes the output of buildConceptSetQueries — a named
list of SQL SELECT statements — unions them together, and
writes the result into a single temporary database table with columns
cs_name and concept_id.
Usage
createConceptSetTempTable(
connection,
csQueries,
vocabularyDatabaseSchema,
tempTableName = "#concept_sets_c",
tempEmulationSchema = NULL
)
Arguments
connection |
A |
csQueries |
A named list of SQL query strings as returned by
|
vocabularyDatabaseSchema |
Character string.
The fully qualified schema containing the OMOP vocabulary tables.
This value is substituted for the
|
tempTableName |
Character string.
Name of the temporary table to create.
Defaults to |
tempEmulationSchema |
Character string or |
Details
The function performs the following steps:
Filters out any empty query strings from
csQueries.Unions all remaining queries into a single
SELECTstatement.Wraps the union in a
SELECT * INTO <tempTableName>statement.Renders the
@vocabulary_database_schemaparameter and translates the SQL for the target DBMS.Executes the SQL via
renderTranslateExecuteSql.Creates a non-unique index on
(cs_name, concept_id)for efficient downstream joins.
If all queries are empty after filtering, the function stops with an informative error.
Value
Invisibly returns the name of the created temporary table
(tempTableName).
Called primarily for its side effect of creating the table in the
database.
Table schema
The created temporary table contains two columns:
cs_nameCharacter. The concept-set label (derived from the names of
csQueries).concept_idInteger. A resolved OMOP standard concept identifier.
See Also
buildConceptSetQueries for generating the input,
buildConceptSetQuery for the single-expression builder,
renderTranslateExecuteSql.
Examples
## Not run:
library(DatabaseConnector)
conn <- connect(dbms = "postgresql",
server = "localhost/ohdsi",
user = "user",
password = "pass")
csList <- list(
diabetes = list(items = list(
list(concept = list(CONCEPT_ID = 201820),
includeDescendants = TRUE),
list(concept = list(CONCEPT_ID = 433962))
)),
hypertension = list(items = list(
list(concept = list(CONCEPT_ID = 316866),
includeDescendants = TRUE)
))
)
csQueries <- buildConceptSetQueries(csList,
vocabularyDatabaseSchema = "cdm_v5")
createConceptSetTempTable(
connection = conn,
csQueries = csQueries,
vocabularyDatabaseSchema = "cdm_v5",
tempTableName = "#concept_sets"
)
# Query the resulting table
result <- querySql(conn, "SELECT * FROM #concept_sets;")
head(result)
disconnect(conn)
## End(Not run)
Create Custom Covariate Settings for FeatureExtraction
Description
Creates a covariateSettings object that can be passed directly to
FeatureExtraction::getDbCovariateData() as a custom covariate
builder. The settings specify which OdysseusCharacterizationModule analyses
to run, including time windows, base features, cohort features, and
concept-set features.
Usage
createOcmCovariateSettings(
analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180,
365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)),
useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels =
c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"),
condition_era = list(include = FALSE, type = "start"), drug_era = list(include =
FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence =
list(include = FALSE), observation = list(include = FALSE), device_exposure =
list(include = FALSE), visit_occurrence = list(include = TRUE, type = "start"),
measurement = list(include = FALSE)),
useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames
= NULL, cohortTable = NULL, covariateSchema = NULL),
useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary")
)
Arguments
analysisWindows |
An |
useBaseFeatures |
Named list of domain configurations, same structure
as in |
useCohortFeatures |
List specifying cohort-based feature extraction,
same structure as in |
useConceptSetFeatures |
List specifying concept-set-based feature
extraction, same structure as in |
Value
An S3 object of class covariateSettings with an attribute
fun set to "getDbOcmCovariateData". This object can be
passed as the covariateSettings argument to
FeatureExtraction::getDbCovariateData(), or used standalone by
calling getDbOcmCovariateData directly.
See Also
getDbOcmCovariateData for the corresponding builder function,
planAnalysis for parameter documentation.
Examples
settings <- createOcmCovariateSettings(
analysisWindows = defineAnalysisWindows(
startDays = c(-365, -30),
endDays = c(-1, -1)
),
useBaseFeatures = list(
drug_exposure = list(include = TRUE, atc = FALSE),
condition_occurrence = list(include = TRUE, type = "start"),
condition_era = list(include = FALSE),
drug_era = list(include = FALSE),
procedure_occurrence = list(include = FALSE),
observation = list(include = FALSE),
device_exposure = list(include = FALSE),
visit_occurrence = list(include = FALSE),
measurement = list(include = FALSE)
)
)
Define analysis windows
Description
Define analysis windows
Usage
defineAnalysisWindows(
startDays = c(-365, -180, -30, 0, 1, 30, 180, 365),
endDays = c(-1, -1, -1, 0, 30, 180, 365, 700),
windowNames = NULL
)
Arguments
startDays |
Integer vector of start days relative to cohort start date. |
endDays |
Integer vector of end days relative to cohort start date. |
windowNames |
Optional character vector of names for each window. |
Value
A list of analysisWindow objects.
Execute a Single Analysis Specification
Description
Renders the SQL for a singleNodeSpec object, translates it to
the target DBMS dialect, executes it against an open
connect connection, and returns the
result set as a data.frame.
Usage
executeSpec(connection, spec, targetDialect = NULL, tempEmulationSchema = NULL)
Arguments
connection |
A |
spec |
A |
targetDialect |
Character string. The target DBMS dialect for SQL
translation (e.g., |
tempEmulationSchema |
Character string or |
Details
The function performs the following steps:
Renders all
@placeholdertokens viarenderSpecSql.Splits the rendered SQL into individual statements.
For aggregated specs, executes the INSERT statement(s) via
executeSql, then runs the final SELECT (aggregation) viaquerySql.For non-aggregated specs, executes all statements via
executeSql, then queries the temp table for rows matching the currentanalysis_id.
Value
For aggregated specs: a data.frame with columns
cohort_definition_id, covariate_id,
covariate_name, concept_id, analysis_id,
sum_value, mean_value.
For non-aggregated specs: a data.frame with the raw
patient-level rows inserted into #domain_raw_results.
See Also
executeSpecs for batch execution of multiple specs,
renderSpecSql for SQL rendering,
singleNodeSetting for creating specs.
Examples
## Not run:
conn <- DatabaseConnector::connect(dbms = "postgresql",
server = "localhost/ohdsi",
user = "user", password = "pass")
result <- executeSpec(conn, specs[[1]])
head(result)
DatabaseConnector::disconnect(conn)
## End(Not run)
Execute Multiple Analysis Specifications
Description
Iterates over a singleNodeSettingList and executes each spec
sequentially against the database. All specs share the same
#domain_raw_results temp table across executions.
Usage
executeSpecs(
connection,
specs,
targetDialect = NULL,
tempEmulationSchema = NULL,
cleanTempTables = FALSE,
stopOnError = TRUE
)
Arguments
connection |
A |
specs |
A |
targetDialect |
Character string or |
tempEmulationSchema |
Character string or |
cleanTempTables |
Logical. If |
stopOnError |
Logical. If |
Details
The function:
Logs a summary header.
Calls
executeSpecfor each spec in order.Drops the shared
#domain_raw_resultstemp table after all specs have been executed (cleanup).Returns all results as a named list.
Value
A named list of data.frame objects, one per spec.
Names are the analysis IDs (as character strings).
When stopOnError = FALSE, failed specs produce a
data.frame with zero rows and an "error" attribute
containing the error message.
See Also
executeSpec for single-spec execution,
singleNodeSetting for creating specs.
Examples
## Not run:
conn <- DatabaseConnector::connect(dbms = "postgresql",
server = "localhost/ohdsi",
user = "user", password = "pass")
results <- executeSpecs(conn, specs)
lapply(results, head)
DatabaseConnector::disconnect(conn)
## End(Not run)
Get Custom Covariate Data from the Database
Description
Builder function that implements the FeatureExtraction custom covariate
builder interface. It executes the OdysseusCharacterizationModule pipeline
and returns a CovariateData object (an Andromeda object with
covariates, covariateRef, and analysisRef tables).
Usage
getDbOcmCovariateData(
connection,
tempEmulationSchema = NULL,
cdmDatabaseSchema,
cdmVersion = "5",
cohortTable = "#cohort_person",
cohortIds = c(-1),
rowIdField = "subject_id",
covariateSettings,
aggregated = FALSE,
minCharacterizationMean = 0,
...
)
Arguments
connection |
A |
tempEmulationSchema |
Character or |
cdmDatabaseSchema |
Character. Schema containing the OMOP CDM tables. |
cdmVersion |
Character. OMOP CDM version ( |
cohortTable |
Character. Fully qualified name of the cohort table
(e.g., |
cohortIds |
Integer vector. Cohort definition IDs to extract
covariates for. Use |
rowIdField |
Character. Column name in the cohort table used as the
row identifier. Typically |
covariateSettings |
An object created by
|
aggregated |
Logical. Currently only |
minCharacterizationMean |
Numeric. Minimum mean value for filtering (currently unused; present for interface compatibility). |
... |
Additional arguments passed by
|
Details
This function is normally not called directly. Instead, create a settings
object with createOcmCovariateSettings and pass it to
FeatureExtraction::getDbCovariateData().
Value
A CovariateData object (Andromeda) with:
covariatesSparse table:
rowId,covariateId,covariateValue.covariateRefReference:
covariateId,covariateName,analysisId,conceptId.analysisRefReference:
analysisId,analysisName,domainId,startDay,endDay,isBinary,missingMeansZero.
See Also
createOcmCovariateSettings for creating the settings object.
Apply a Function to Each Element with Its Index or Name
Description
A family of indexed mapping functions modelled after
purrr::imap.
.f receives two arguments: the element value and its name
(if .x is named) or its integer position (if .x is
unnamed).
Usage
imap(.x, .f, ...)
imap_chr(.x, .f, ...)
imap_dbl(.x, .f, ...)
imap_dfr(.x, .f, ...)
iwalk(.x, .f, ...)
Arguments
.x |
A list or atomic vector. |
.f |
A function or one-sided formula taking two arguments:
the element value ( |
... |
Additional arguments passed to |
Details
When .x has names, the second argument to .f is the
element name (a character string).
When .x is unnamed, the second argument is the positional
index (an integer).
Value
imapA list the same length as
.x.imap_chrA character vector.
imap_dblA double (numeric) vector.
imap_dfrA
data.frameformed byrbind-ing per-element results.
A character vector.
A numeric (double) vector.
A data.frame created by row-binding per-element
results.
Invisibly returns .x (called for side effects).
See Also
Apply a Function to Each Element of a List or Vector
Description
A family of lightweight mapping functions modelled after
purrr::map.
map() always returns a list; the typed variants
(map_chr, map_dbl, map_int, map_lgl)
return atomic vectors of the indicated type; and the data-frame
variants (map_dfr, map_dfc) row-bind or column-bind
the results into a single data.frame.
Usage
map(.x, .f, ...)
map_chr(.x, .f, ...)
map_dbl(.x, .f, ...)
map_int(.x, .f, ...)
map_lgl(.x, .f, ...)
map_dfr(.x, .f, ...)
map_dfc(.x, .f, ...)
Arguments
.x |
A list or atomic vector. |
.f |
A function, one-sided formula, or atomic vector.
Formulas are converted via |
... |
Additional arguments passed to |
Details
These functions provide a dependency-free subset of the
purrr mapping interface.
map() is a thin wrapper around lapply;
the typed variants use vapply with an appropriate
FUN.VALUE template.
Value
mapA list the same length as
.x.map_chrA character vector the same length as
.x.map_dblA double (numeric) vector the same length as
.x.map_intAn integer vector the same length as
.x.map_lglA logical vector the same length as
.x.map_dfrA
data.frameformed byrbind-ing the per-element results.map_dfcA
data.frameformed bycbind-ing the per-element results.
A character vector.
A numeric (double) vector.
An integer vector.
A logical vector.
A data.frame created by row-binding per-element results.
A data.frame created by column-binding per-element
results.
See Also
as_mapper, map2,
pmap, imap, walk
Apply a Function to Pairs of Elements from Two Lists
Description
A family of mapping functions that iterate over two inputs in
parallel, modelled after purrr::map2.
map2() returns a list; the typed variants return atomic
vectors; and the data-frame variants row-bind or column-bind the
results.
Usage
map2(.x, .y, .f, ...)
map2_chr(.x, .y, .f, ...)
map2_dbl(.x, .y, .f, ...)
map2_int(.x, .y, .f, ...)
map2_lgl(.x, .y, .f, ...)
map2_dfr(.x, .y, .f, ...)
map2_dfc(.x, .y, .f, ...)
Arguments
.x, .y |
Two vectors or lists of the same length (or length one, which is recycled). |
.f |
A function or one-sided formula taking (at least) two
arguments.
Formulas are converted via |
... |
Additional arguments passed to |
Details
All variants are thin wrappers around mapply with
SIMPLIFY = FALSE.
The typed variants coerce the unlisted result to the target type.
Value
map2A list the same length as
.x.map2_chrA character vector.
map2_dblA double (numeric) vector.
map2_intAn integer vector.
map2_lglA logical vector.
map2_dfrA
data.frameformed byrbind-ing per-pair results.map2_dfcA
data.frameformed bycbind-ing per-pair results.
A character vector.
A numeric (double) vector.
An integer vector.
A logical vector.
A data.frame created by row-binding per-pair results.
A data.frame created by column-binding per-pair
results.
See Also
Plan Characterization Analyses
Description
Creates a comprehensive characterization analysis plan for patient-level feature extraction from an OMOP Common Data Model (CDM) database. The plan defines time windows, base clinical features, cohort-based features, and concept-set-based features to be extracted relative to a target cohort's index date.
Creates a comprehensive characterization analysis plan for patient-level feature extraction from an OMOP Common Data Model (CDM) database. The plan defines time windows, base clinical features, cohort-based features, and concept-set-based features to be extracted relative to a target cohort's index date.
Usage
planAnalysis(
analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180,
365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)),
useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels =
c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"),
condition_era = list(include = FALSE, type = "start"), drug_era = list(include =
FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence =
list(include = FALSE), observation = list(include = FALSE), device_exposure =
list(include = FALSE), visit_occurrence = list(include = FALSE, type = "start"),
measurement = list(include = FALSE)),
useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames
= NULL, cohortTable = NULL, covariateSchema = NULL),
useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary")
)
planAnalysis(
analysisWindows = defineAnalysisWindows(startDays = c(-365, -180, -30, 0, 1, 30, 180,
365), endDays = c(-1, -1, -1, 0, 30, 180, 365, 700)),
useBaseFeatures = list(drug_exposure = list(include = FALSE, atc = FALSE, atcLevels =
c(1L, 2L, 3L, 4L, 5L)), condition_occurrence = list(include = FALSE, type = "start"),
condition_era = list(include = FALSE, type = "start"), drug_era = list(include =
FALSE, type = "start", atc = FALSE, atcLevels = c(5L)), procedure_occurrence =
list(include = FALSE), observation = list(include = FALSE), device_exposure =
list(include = FALSE), visit_occurrence = list(include = FALSE, type = "start"),
measurement = list(include = FALSE)),
useCohortFeatures = list(include = FALSE, type = "start", cohortIds = NULL, cohortNames
= NULL, cohortTable = NULL, covariateSchema = NULL),
useConceptSetFeatures = list(conceptSets = NULL, include = FALSE, type = "binary")
)
Arguments
analysisWindows |
An object of class |
useBaseFeatures |
A named list of domain configurations. Each element
name must correspond to a supported OMOP CDM table (e.g.,
|
useCohortFeatures |
A list specifying cohort-based feature extraction with the following components:
|
useConceptSetFeatures |
A list specifying concept-set-based feature extraction with the following components:
|
Details
This function assembles a characterizationSettings object that serves as a
blueprint for downstream feature extraction. It supports three complementary
feature extraction strategies:
Base Features
Standard OMOP CDM domain tables are used to construct binary or count-based covariates. Supported domains include:
-
drug_exposure / drug_era: Drug concepts, optionally rolled up to ATC hierarchy levels 1–5.
-
condition_occurrence / condition_era: Condition concepts with configurable temporal logic.
-
procedure_occurrence: Procedure concepts.
-
observation: Observation concepts.
-
device_exposure: Device concepts.
-
visit_occurrence: Visit concepts.
-
measurement: Measurement concepts.
Each domain accepts:
includeLogical. Whether to extract features from this domain.
typeCharacter. Temporal logic:
"start"uses the record start date;"overlap"uses era-style overlap with the time window. Applicable to era tables and visit_occurrence.atcLogical. Whether to roll up drug concepts to ATC hierarchy levels. Applicable to drug_exposure and drug_era only.
atcLevelsInteger vector. ATC hierarchy levels to include (1–5). Applicable when
atc = TRUE.
Cohort Features
Pre-defined cohorts (stored in a cohort table) are used as binary covariates, indicating whether a patient belongs to each specified cohort within each time window.
Concept Set Features
User-defined concept sets (analogous to ATLAS concept sets) are used to create
targeted covariates. Each concept set specifies one or more concepts
(optionally including descendants) and the CDM tables to search. Output type
can be "binary" (presence/absence) or "counts" (frequency).
This function assembles a characterizationSettings object that serves as a
blueprint for downstream feature extraction. It supports three complementary
feature extraction strategies:
Base Features
Standard OMOP CDM domain tables are used to construct binary or count-based covariates. Supported domains include:
-
drug_exposure / drug_era: Drug concepts, optionally rolled up to ATC hierarchy levels 1–5.
-
condition_occurrence / condition_era: Condition concepts with configurable temporal logic.
-
procedure_occurrence: Procedure concepts.
-
observation: Observation concepts.
-
device_exposure: Device concepts.
-
visit_occurrence: Visit concepts.
-
measurement: Measurement concepts.
Each domain accepts:
includeLogical. Whether to extract features from this domain.
typeCharacter. Temporal logic:
"start"uses the record start date;"overlap"uses era-style overlap with the time window. Applicable to era tables and visit_occurrence.atcLogical. Whether to roll up drug concepts to ATC hierarchy levels. Applicable to drug_exposure and drug_era only.
atcLevelsInteger vector. ATC hierarchy levels to include (1–5). Applicable when
atc = TRUE.
Cohort Features
Pre-defined cohorts (stored in a cohort table) are used as binary covariates, indicating whether a patient belongs to each specified cohort within each time window.
Concept Set Features
User-defined concept sets (analogous to ATLAS concept sets) are used to create
targeted covariates. Each concept set specifies one or more concepts
(optionally including descendants) and the CDM tables to search. Output type
can be "binary" (presence/absence) or "counts" (frequency).
Value
An S3 object of class characterizationSettings containing:
analysisWindowsThe validated analysis windows.
useBaseFeaturesThe validated base feature configuration.
useCohortFeaturesThe validated cohort feature configuration.
useConceptSetFeaturesThe validated concept set feature configuration.
An S3 object of class characterizationSettings containing:
analysisWindowsThe validated analysis windows.
useBaseFeaturesThe validated base feature configuration.
useCohortFeaturesThe validated cohort feature configuration.
useConceptSetFeaturesThe validated concept set feature configuration.
See Also
defineAnalysisWindows for creating time window definitions.
defineAnalysisWindows for creating time window definitions.
Examples
# Minimal plan with default settings
plan <- planAnalysis()
# Custom plan: conditions and drugs only, two time windows
plan <- planAnalysis(
analysisWindows = defineAnalysisWindows(
startDays = c(-365, 1),
endDays = c(-1, 365)
),
useBaseFeatures = list(
drug_exposure = list(
include = TRUE,
atc = TRUE,
atcLevels = c(3L, 5L)
),
condition_occurrence = list(
include = TRUE,
type = "start"
),
condition_era = list(include = FALSE),
drug_era = list(include = FALSE),
procedure_occurrence = list(include = FALSE),
observation = list(include = FALSE),
device_exposure = list(include = FALSE),
visit_occurrence = list(include = FALSE),
measurement = list(include = FALSE)
),
useCohortFeatures = list(include = FALSE),
useConceptSetFeatures = list(include = FALSE)
)
# Plan with cohort features
plan <- planAnalysis(
useCohortFeatures = list(
include = TRUE,
type = "start",
cohortIds = c(101L, 102L, 103L),
cohortNames = c("T2DM", "Hypertension", "CKD"),
cohortTable = "my_cohort_table",
covariateSchema = "results_schema"
)
)
# Plan with custom concept sets
plan <- planAnalysis(
useConceptSetFeatures = list(
conceptSets = list(
diabetes = list(
items = list(
list(concept = list(CONCEPT_ID = 201820L),
includeDescendants = TRUE)
),
tables = c("condition_occurrence")
)
),
include = TRUE,
type = "counts"
)
)
# Minimal plan with default settings
plan <- planAnalysis()
# Custom plan: conditions and drugs only, two time windows
plan <- planAnalysis(
analysisWindows = defineAnalysisWindows(
startDays = c(-365, 1),
endDays = c(-1, 365)
),
useBaseFeatures = list(
drug_exposure = list(
include = TRUE,
atc = TRUE,
atcLevels = c(3L, 5L)
),
condition_occurrence = list(
include = TRUE,
type = "start"
),
condition_era = list(include = FALSE),
drug_era = list(include = FALSE),
procedure_occurrence = list(include = FALSE),
observation = list(include = FALSE),
device_exposure = list(include = FALSE),
visit_occurrence = list(include = FALSE),
measurement = list(include = FALSE)
),
useCohortFeatures = list(include = FALSE),
useConceptSetFeatures = list(include = FALSE)
)
# Plan with cohort features
plan <- planAnalysis(
useCohortFeatures = list(
include = TRUE,
type = "start",
cohortIds = c(101L, 102L, 103L),
cohortNames = c("T2DM", "Hypertension", "CKD"),
cohortTable = "my_cohort_table",
covariateSchema = "results_schema"
)
)
# Plan with custom concept sets
plan <- planAnalysis(
useConceptSetFeatures = list(
conceptSets = list(
diabetes = list(
items = list(
list(concept = list(CONCEPT_ID = 201820L),
includeDescendants = TRUE)
),
tables = c("condition_occurrence")
)
),
include = TRUE,
type = "counts"
)
)
Extract an Element from a Nested Structure
Description
Safely navigates into a nested list or vector using a sequence of accessors (names, integer positions, or functions), returning a default value when any accessor fails.
Usage
pluck(.x, ..., .default = NULL)
Arguments
.x |
A list, vector, or other sub-settable object. |
... |
A sequence of accessors applied left-to-right to
progressively drill into
|
.default |
The value to return if any accessor fails (index out
of bounds, name not found, intermediate or final value is
|
Details
This function provides a dependency-free equivalent of
purrr::pluck.
It is intentionally strict: accessor types other than integer,
character, or function cause an error.
When no accessors are supplied (... is empty), .x
itself is returned.
Value
The value found at the end of the accessor chain, or
.default if any step along the way fails.
See Also
Apply a Function to Multiple Lists in Parallel
Description
A family of mapping functions that iterate over an arbitrary number
of inputs in parallel, modelled after
purrr::pmap.
pmap() returns a list; the typed variants return atomic
vectors; and the data-frame variants row-bind or column-bind the
results.
Usage
pmap(.l, .f, ...)
pmap_chr(.l, .f, ...)
pmap_dbl(.l, .f, ...)
pmap_int(.l, .f, ...)
pmap_lgl(.l, .f, ...)
pmap_dfr(.l, .f, ...)
pmap_dfc(.l, .f, ...)
Arguments
.l |
A list of vectors or lists, all of the same length.
Each element of |
.f |
A function or one-sided formula.
The function should accept as many arguments as there are elements
in |
... |
Additional arguments passed to |
Details
All variants delegate to mapply via
do.call, passing the elements of .l as
parallel arguments.
Value
pmapA list whose length equals the common length of the elements of
.l.pmap_chrA character vector.
pmap_dblA double (numeric) vector.
pmap_intAn integer vector.
pmap_lglA logical vector.
pmap_dfrA
data.frameformed byrbind-ing per-index results.pmap_dfcA
data.frameformed bycbind-ing per-index results.
A character vector.
A numeric (double) vector.
An integer vector.
A logical vector.
A data.frame created by row-binding per-index results.
A data.frame created by column-binding per-index
results.
See Also
Print Characterization Settings
Description
Prints a human-readable summary of a characterizationSettings object.
Prints a human-readable summary of a characterizationSettings object.
Usage
## S3 method for class 'characterizationSettings'
print(x, ...)
## S3 method for class 'characterizationSettings'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (ignored). |
Value
Invisibly returns x.
Invisibly returns x.
Print Single Node Setting List
Description
Print Single Node Setting List
Usage
## S3 method for class 'singleNodeSettingList'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (ignored). |
Value
Invisibly returns x.
Print Single Node Spec
Description
Print Single Node Spec
Usage
## S3 method for class 'singleNodeSpec'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (ignored). |
Value
Invisibly returns x.
Render SQL for All Specs in a singleNodeSettingList
Description
Convenience wrapper that calls renderSpecSql on every
element of a singleNodeSettingList.
Usage
renderAllSpecSql(specs, targetDialect = NULL, tempEmulationSchema = NULL)
Arguments
specs |
A |
targetDialect |
Character or |
tempEmulationSchema |
Character or |
Value
A named character vector of rendered SQL statements, one per spec. Names are the analysis IDs (as character).
Render a Single-Node SQL Specification
Description
Takes a singleNodeSpec object whose sql field contains
a parameterised SQL template and resolves every @placeholder
using the spec's own fields.
Usage
renderSpecSql(spec, targetDialect = NULL, tempEmulationSchema = NULL)
Arguments
spec |
A |
targetDialect |
Character string (optional).
When supplied, the rendered SQL is additionally translated to the
target DBMS dialect via |
tempEmulationSchema |
Character string or |
Value
A single character string of executable SQL.
Create a Single Node Analysis Specification
Description
Translates a characterizationSettings object (as returned by
planAnalysis) into a list of executable SQL-based analysis
specifications. Each specification ("node") pairs a feature domain, a time
window, and the appropriate SQL template with all placeholders resolved.
Usage
singleNodeSetting(
plan,
cohortId,
cohortDatabaseSchema,
cohortTable,
cdmDatabaseSchema,
vocabularyDatabaseSchema = cdmDatabaseSchema,
aggregated = TRUE,
rowIdField = "subject_id"
)
Arguments
plan |
A |
cohortId |
Integer scalar. The target cohort definition ID. |
cohortDatabaseSchema |
Character scalar. Schema containing the target cohort table. |
cohortTable |
Character scalar. Name of the target cohort table. |
cdmDatabaseSchema |
Character scalar. Schema containing the OMOP CDM tables. |
vocabularyDatabaseSchema |
Character scalar. Schema containing the OMOP
vocabulary tables. Used for concept name lookups in aggregated output.
Defaults to |
aggregated |
Logical scalar. If |
rowIdField |
Character scalar. Name of the column in the cohort table
to use as the row identifier in the output. Defaults to
|
Details
This function iterates over every enabled domain in useBaseFeatures,
useCohortFeatures, and useConceptSetFeatures, crosses each with
every analysis window, and produces a fully parameterised run specification.
The returned list can be passed directly to an execution engine that renders
and translates the SQL via SqlRender.
Analysis ID Assignment
Each specification receives a unique analysisId constructed as:
domainIndex * 1000 + windowIndex, ensuring stable, reproducible
identifiers across runs.
Value
A list of S3 objects of class singleNodeSpec. Each element
contains:
analysisIdInteger. Unique analysis identifier.
analysisNameCharacter. Human-readable analysis label.
domainTableCharacter. CDM table name.
conceptIdColCharacter. Concept ID column in the domain table.
dateColCharacter. Start date column.
dateColEndCharacter or
NULL. End date column (for overlap logic).startDayInteger. Window start day relative to index.
endDayInteger. Window end day relative to index.
typeCharacter. Temporal logic (
"start"or"overlap").overlapLogical. Whether overlap logic is used.
atcLogical. Whether ATC roll-up is applied.
atcLevelsInteger vector or
NULL. ATC levels.conceptSetLogical. Whether a concept set filter is applied.
conceptSetItemsList or
NULL. Concept set items.aggregatedLogical. Aggregation flag.
cohortIdInteger. Target cohort ID.
cohortDatabaseSchemaCharacter. Cohort schema.
cohortTableCharacter. Cohort table name.
cdmDatabaseSchemaCharacter. CDM schema.
sqlCharacter. Parameterised SQL template.
sourceCharacter. Origin:
"base","cohort", or"conceptSet".
See Also
planAnalysis for creating the analysis plan.
Examples
plan <- planAnalysis(
useBaseFeatures = list(
condition_occurrence = list(include = TRUE, type = "start"),
drug_exposure = list(include = FALSE),
condition_era = list(include = FALSE),
drug_era = list(include = FALSE),
procedure_occurrence = list(include = FALSE),
observation = list(include = FALSE),
device_exposure = list(include = FALSE),
visit_occurrence = list(include = FALSE),
measurement = list(include = FALSE)
),
useCohortFeatures = list(include = FALSE),
useConceptSetFeatures = list(include = FALSE)
)
specs <- singleNodeSetting(
plan = plan,
cohortId = 1L,
cohortDatabaseSchema = "results",
cohortTable = "cohort",
cdmDatabaseSchema = "cdm"
)
Apply a Function for Side Effects
Description
Execute .f on each element (or pair / tuple of elements) for
its side effects, returning the input invisibly.
walk() iterates over a single input, walk2() over two
inputs in parallel, and pwalk() over an arbitrary number of
inputs stored in a list.
Usage
walk(.x, .f, ...)
walk2(.x, .y, .f, ...)
pwalk(.l, .f, ...)
Arguments
.x |
A list or atomic vector. |
.f |
A function, one-sided formula, or atomic vector.
Formulas are converted via |
... |
Additional arguments passed to |
.y |
A vector or list the same length as |
.l |
A list of vectors or lists of equal length
(used by |
Details
These functions are the side-effect counterparts of map,
map2, and pmap, respectively.
Value
walkInvisibly returns
.x.walk2Invisibly returns
.x.pwalkInvisibly returns
.l.
Invisibly returns .x.
Invisibly returns .l.