Methodology

Why CHMS harmonization is non-trivial

The Canadian Health Measures Survey (CHMS) is a repeated cross-sectional survey conducted by Statistics Canada since 2007. Each cycle collects questionnaire, physical measurement, and laboratory data from a nationally representative sample of Canadians. Combining data across cycles enables trend analyses and increases statistical power, but the raw data is not directly comparable across cycles for several reasons.

Variable naming changes

Variable names sometimes change between cycles. For example, the accelerometer variable for moderate-to-vigorous physical activity on day 1 is named amsdmva1 in cycle 1, but ammdmva1 in cycles 2–6. A researcher pooling cycles must know about every such rename, or risk silent misalignment.

chmsflow handles these renames through metadata in variable-details.csv. The variableStart column uses a mixed format to specify cycle-specific exceptions:

cycle1::amsdmva1, [ammdmva1]

This means: use amsdmva1 for cycle 1, and ammdmva1 for all other cycles. The recodeflow package reads this format and applies the correct mapping automatically. For the full list of variable naming patterns, see inst/metadata/schemas/chms/chms_database_config.yaml in the package source.

Coding scheme differences

Even when variable names are stable, the coding categories may differ. CHMS response codes for missing data, valid skips, and refusals vary across variables and cycles. chmsflow’s recoding rules in variable-details.csv define explicit mappings for each response code, ensuring consistent treatment across cycles.

Medication data format changes

Cycles 1–2 store medication data in a wide format: up to 80 columns of ATC code and time-last-taken pairs per respondent. Cycles 3–6 use a long format: one row per medication per respondent with two columns (meucatc and npi_25b). chmsflow provides separate functions for each format (recode_meds_cycles1to2() and recode_meds_cycles3to6()) with identical call signatures, so the downstream workflow is the same regardless of cycle. See Recoding medications for details.

How chmsflow works

Rules as data, derivation as code

chmsflow builds on the recodeflow package, which separates recoding rules from recoding logic. The rules live in two CSV metadata files:

The recoding logic lives in recodeflow::rec_with_table(), which reads the metadata and applies the mappings. This separation means that adding or correcting a recoding rule is a CSV edit, not a code change.

For detailed schema documentation, see Variable schema reference.

The variableStart column

The variableStart column in variable-details.csv tells rec_with_table() where to find the source data. It supports several formats:

Format Meaning Example
[variable_name] Same name across all cycles [clc_age]
cycle1::name1, [default_name] Cycle-specific exception with a default cycle1::amsdmva1, [ammdmva1]
DerivedVar::[var1, var2, ...] Computed by a function from listed inputs DerivedVar::[lab_bcre, pgdcgt, clc_sex, clc_age]
Func::function_name The R function that computes the derived variable (in recTo) Func::calculate_gfr

The recStart and recEnd columns

These columns define the mapping from source values to harmonized values:

Missing data semantics

CHMS uses numeric codes for missing data (e.g., 996 for valid skip, 997--999 for don’t know / refusal / not stated). chmsflow converts these to haven::tagged_na() values that preserve the reason for missingness:

This distinction matters for analysis. For example, a respondent who was never asked about smoking (valid skip) should be treated differently from one who refused to answer (missing). See Missing data (tagged_na) for a full explanation.

Derived variables

Some harmonized variables cannot be created by simple value mapping. These are computed by R functions referenced in variable-details.csv with the Func:: prefix. Examples include:

The DerivedVar:: prefix in variableStart lists the input variables that must be present in the data before the function can run. See Derived variables for details.

Known limits

Next steps

References