Initial CRAN release. artoo is a lightweight, lossless, CDISC-native
reader and writer for clinical-trial datasets, built around one
canonical metadata model (artoo_meta) so that conversion
between any two supported formats is lossless by construction. Pure R
and lightweight, with no external SAS or Java runtime.
Reads and writes SAS XPORT (v5 and v8), CDISC Dataset-JSON v1.1,
NDJSON, Apache Parquet, and RDS. Every codec carries the full
artoo_meta — labels, CDISC data types, lengths, SAS display
formats, controlled-terminology references, and sort keys — so
any-to-any conversion preserves the complete metadata. For Parquet the
metadata rides as a metadata_json sidecar; a file written
by another tool with no sidecar degrades gracefully to a bare frame
rather than an error.
Generic read_dataset() /
write_dataset() dispatch on the file extension, with
read_xpt() / write_xpt() and the matching
read_json(), read_ndjson(),
read_parquet(), and read_rds() pairs as direct
entry points. Cross-cutting encoding, checks,
and created arguments flow through
....
Partial reads (col_select, n_max) on
every reader; gzip-transparent JSON and NDJSON; multi-member SAS XPORT
libraries via xpt_members() plus
read_xpt(member = ).
Numeric fidelity is exact end to end: a decimal
value is exchanged as a string at IEEE round-trip precision,
integer values beyond R’s 32-bit range stay numeric rather
than overflowing, and NaN / infinite values are rejected as
invalid CDISC numerics. Rows are sorted in C-locale (byte) order, so a
written file is deterministic across locales and matches SAS
PROC SORT for ASCII keys.
Encodings follow the IANA and SAS standards: the readers and
writers accept a charset name in either the SAS or R spelling (see
artoo_encodings()), character columns are transcoded to
UTF-8 and NFC-normalized on read, and the
on_invalid = c("error", "replace", "ignore") policy governs
invalid bytes uniformly across every writer.
artoo_spec() builds the canonical metadata model
from a Pinnacle 21 Excel workbook, a Define-XML 2.0 / 2.1 file, or a
native artoo JSON spec. read_spec() /
write_spec() dispatch on the file extension:
.xlsx writes a Pinnacle 21 workbook (Define-XML to P21 is
one composition), and the native JSON form is the lossless interchange
that round-trips a spec identically.
The spec is single-standard by construction:
@standard is resolved once from the explicit argument or
the source, and study-level fields are canonicalized to the CDISC ODM
vocabulary. Accessors include spec_standard(),
spec_variables(), spec_codelists(),
spec_methods(), and spec_comments().
set_type() returns a spec with one or more variables
retyped through the CDISC vocabulary; repair_spec() retypes
every variable a check_spec() run flags as fractional or
out-of-range under an integer data type, so a frame the
original spec would refuse coerces after one call.
apply_spec(x, spec, dataset, conformance = , na_position = )
coerces each column to its CDISC data type, orders the columns and sorts
the rows by the spec’s keys, and stamps the artoo_meta.
extra = c("keep", "drop") controls whether undeclared
columns survive; on_coercion_loss = c("error", "keep")
governs a coercion that would lose data. The pipeline never silently
fabricates or drops a column: an undeclared column is reported and kept,
a declared-but-absent column is reported and left absent.
check_spec() validates a data frame against its spec
across conformance dimensions toggled by artoo_checks();
check_study() runs it over a whole study and returns one
stacked findings frame; conformance() reads the findings
back off a stamped frame. validate_spec() checks a spec for
internal consistency against a bundled rule catalog, with no external
dependency.
decode_column() translates coded values to or from
their codelist decodes; sync_meta() reconciles a stamped
frame’s metadata after manual edits.
members() is the format-neutral inventory of the
dataset(s) a path holds, one row per dataset, dispatched through the
codec registry. columns() is the SAS PROC CONTENTS /
Universal Viewer variable pane over a stamped frame, a plain data frame,
or a file path. get_meta() / set_meta() read
and attach the artoo_meta.artoo_<severity>_<kind>,
artoo_<severity>, artoo_condition — so a
handler can catch a specific kind, a whole severity, or every artoo
condition. The data-protection conditions attach their evidence as data
(cnd$variables, cnd$findings) for programmatic
inspection.adam_spec (ADaMIG 1.1) and
sdtm_spec (SDTMIG 3.1.2), built reproducibly from the
official CDISC Define-XML 2.1 release examples and shipped also as
Pinnacle 21 workbooks under inst/extdata/. Demo datasets
come from the PHUSE Test Data Factory; the constructor tables
cdisc_adam_datasets / cdisc_adam_variables,
cdisc_sdtm_datasets / cdisc_sdtm_variables,
and the shared cdisc_codelists build a spec by hand. Every
bundled dataset conforms to its bundled spec, gated at build and test
time.vignette("artoo") plus task-oriented
web articles (specifications; conform and validate; formats and lossless
conversion; recipes), and a pkgdown reference site.