Getting Started with integrity

Package overview
How to use the package
Step 1: Data loading
Step 2: Data preparation
Step 3: Running integrity checks
Step 4: Reviewing results by integrity domain
Computing Environment

Package overview

Increasing concerns about the trustworthiness of research have prompted calls to scrutinise studies’ individual participant data (IPD), that is, de-identified raw line-by-line data for each participant in a study.

integrity was developed to support application of the IPD Integrity Tool (Hunter et al. 2024A). It enables structured and transparent assessment of the integrity and trustworthiness of randomised controlled trials (RCTs) using IPD and informs decisions about whether RCTs should be included in evidence synthesis or considered suitable for publication. Further information may be found about the development of the tool here - (Hunter et al., 2024B).

If you use our package please cite:

Hunter KE, Aberoumand M, Libesman S, Sotiropoulos JX, Williams JG, Aagerup J, Wang R, Mol BW, Li W, Barba A, Shrestha N. Webster AC, Seidler AL. The Individual Participant Data Integrity Tool for assessing the integrity of randomised trials. Research Synthesis Methods. 2024 Nov;15(6):917-39.

How to use the package

Each step of the workflow is illustrated using a case study on umbilical cord management at preterm birth, based on a de-identified and altered data set from the iCOMP study. The main goal was to determine the optimal umbilical cord management strategy at preterm birth, such as milking or delayed cord clamping.

Step 1: Data loading

Load the integrity package into R.

if(requireNamespace("pkgload", quietly = TRUE) && file.exists("../DESCRIPTION")) {
  pkgload::load_all("..")
} else {
  library(integrity)
}

## ℹ Loading integrity

## Warning: package 'testthat' was built under R version 4.5.2

Next, import the data set you wish to examine into R. There are a variety of functions in R or CRAN packages to do this:

read.csv and read.table functions to import comma-separated and tab-separated text files.
read.sas for SAS, read.sav for SPSS and read_dta for STATA in the CRAN package haven.
read_excel function for Microsoft Excel in the CRAN package readxl.

Case study: The altered iCOMP case study is loaded with the integrity package. The data are in a Microsoft Excel file.

library(readxl)
examplePath <- system.file("extdata", "dataset.xlsx", package = "integrity")
dataset <- read_excel(examplePath, sheet=1)
dataset[1:5, ]

## # A tibble: 5 × 18
##   infant_ID rand_date           mat_age blood_loss treatment_cat GA_weeks
##       <dbl> <dttm>                <dbl>      <dbl>         <dbl>    <dbl>
## 1         1 2019-03-21 00:00:00      36        200             2       30
## 2         2 2020-07-17 00:00:00      18        200             1       28
## 3         3 2019-06-14 00:00:00      20        300             1       32
## 4         4 2019-10-08 00:00:00      30        500             2       29
## 5         5 2019-03-02 00:00:00      34        400             1       32
## # ℹ 12 more variables: birthweight <dbl>, sex <dbl>, hospital_days <dbl>,
## #   temp <dbl>, inf_transfusion_any <dbl>, Hct <dbl>, CLD <dbl>, IVH <dbl>,
## #   NEC <dbl>, inf_death <dbl>, enrol_start <dttm>, enrol_end <dttm>

In the tibble above, the sample identifiers can be seen (infant_ID), as well as the date of randomisation (rand_date) and the first few clinical covariates.

Step 2: Data preparation

The following elements are required to be paired with the corresponding column names in your data set:

participantID: The name of the column which corresponds to the unique participant identifier (this variable is mandatory).
enrollment: lists the names of three columns corresponding to start (date of first participant enrollment), randomisation (date of participant randomisation) and end (date of the last participant enrollment).
baseline: lists named dichotomous, polytomous, numeric are for specifying the column name(s) of the column(s) which correspond(s) to baseline measurements.
intervention: the name of the column specifying the intervention or group allocation for each individual (this variable is mandatory).
outcome: lists named common and rare, with sublists named by dichotomous, numeric or polytomous, containing the names of columns of those data types for outcomes assessed.
correlated: A named list of two entries of column names that are expected to be correlated.
unexpected: A named list of column names with values that are not expected to be seen. days is a special sublist and applies to date columns, which are converted into days of the week before comparison. It must have two elements: names, which are the unexpected day names, and locale, which is the locale of the unexpected day names specified.

Only participantID is strictly required. enrollment, baseline, intervention, outcome, correlated, and unexpected should be supplied when available; if a section is omitted, the checks that depend on it will be skipped.

The variable types and expectations need to be defined before running the checks. The package accepts the same metadata structure which may be created in multiple different formats depending on your preference: a list written directly in a R script or markdown, or an Excel template workbook.

Coding the list directly in R as below, is often the simplest option for users already working inside an R script or an R Markdown document, because the metadata can be written directly next to the analysis code. The R code below may be used as a template and altered based on relevant variables in a new dataset.

dataset_info <- list(
  participantID = "infant_ID",
  enrollment = list(
    start = "enrol_start",
    randomisation = "rand_date",
    end = "enrol_end"
  ),
  baseline = list(
    dichotomous = c("sex"), # add more variables if needed e.g., c("sex", "respiratory_support")
    #polytomous = c("ordinal_or_nominal_variable"), # no polytomous baseline variables in this data set so it's commented out 
    numeric = c("mat_age", "GA_weeks", "birthweight")
    # can add polytomous variables if needed
  ),
  intervention = "treatment_cat",
  outcome = list(
    common = list(
      dichotomous = c("IVH", "NEC"),
      polytomous = c("CLD"), # if certain variable types don't exist, just delete the relevant line.
      numeric = c("hospital_days")
    ),
    rare = list(
      dichotomous = c("inf_death") # add more variables if needed e.g., c("inf_death", "severe_IVH")
    )
  ),
  correlated = list(
    timeAndSize = c("GA_weeks", "birthweight")
  ),
  unexpected = list(
    days = list(
      names = c("Saturday", "Sunday"),
      locale = "C"
    ),
    mat_age = c("less than 10", "greater than 50"),
    GA_weeks = c("less than 22", "greater than 37")
  )
)

In the unexpected$days section, locale controls the language used when R converts dates into weekday names. locale = "C" is the most likely option to use because it returns standard English weekday names such as Saturday and Sunday, which usually match the values entered in names. Other locale values are possible if your system or dataset uses a different language or naming convention, but "C" will usually be the safest default.

An Excel template is also available if users prefer to enter the metadata in a spreadsheet. The workbook has one row per entry and four columns named level_1, level_2, level_3, and value. Repeated values, such as several numeric baseline variables, are entered as multiple rows.

This workbook can be edited in Microsoft Excel, then imported into R with read_metadata_excel().

example_excel_path <- system.file("extdata", "variables_template.xlsx", package = "integrity")
dataset_info <- read_metadata_excel(example_excel_path)

Step 3: Running integrity checks

Simply provide the data frame and data information to run_checks. The function first performs automated data checking and cleaning to ensure that all variables defined in the dataset_info file are present in the dataset. The function will also convert columns nominated as factors into factors where required, and remove any columns containing only missing values.

result <- run_checks(dataset, dataset_info)
names(result)

## [1] "check_table"   "detail_tables" "images"        "summary_table"

This creates a list of result objects, including overall check tables, detailed per-variable tables for selected checks, plots, and summary tables. The output for each item below should be reviewed consecutively and rated using the decision guide and rating sheet (found here - Hunter et al. 2024A)

Step 4: Reviewing results by integrity domain

The sections below present the output from the integrity run_checks function, split under each domain and item.

Domain 1: Unusual or repeated data patterns

Item 1.1: Repeating patterns within baseline variables

This item is manually performed by sorting and visually inspecting the data to identify repeating patterns within baseline variables, e.g. check whether values appear to repeat at regular intervals, which may indicate rows were copied and pasted. Rare or unusual entries can be particularly useful for detecting such patterns; assess whether these entries recur systematically, such as every 11 rows. Perform these assessments using the original dataset order, randomisation order, and separately within each study group.

item_1_1 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "1.1", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_1_1, row.names = FALSE)

Item description	Status	Details
Repeated Baselines Within Variables	Skipped	This step is mannually peformed through visual inspection of the raw data

Item 1.2: Repeating data patterns across baseline variables

This item looks for duplication across participants, e.g. do all participants with a height of 180cm have the same weight? Duplicate entries for baseline variables are listed below (if there are none, no input will be printed).

item_1_2 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "1.2", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_1_2, row.names = FALSE)

Item description	Status	Details
Repeated Baselines	Potential integrity issue	sex:1, mat_age:30, GA_weeks:33, birthweight:1568 occurs 2 times.

Item 1.3: Repeating data patterns across baseline variables and rare variables.

item_1_3 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "1.3", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_1_3, row.names = FALSE)

Item description	Status	Details
Repeated Baselines in Rare Outcomes	Pass	No duplicates found.

Item 1.4: Bias in the terminal (rightmost) digits.

This item plots the terminal digit for the selected continuous variables (avoid variables that tend to be rounded or that lack precision). Inspect the bar charts for biased or unexpected distribution.

item_1_4 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "1.4", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_1_4, row.names = FALSE)

Item description	Status	Details
Terminal Digits	Displayed	Terminal digit plot generated.

if("Terminal Digits" %in% names(result[["images"]])) {
  result[["images"]][["Terminal Digits"]]
}

Domain 2: Baseline characteristics

Item 2.1: Excessively homogeneous distribution of binary baseline variables, i.e. loss of independence between consecutive variables

In RCTs we expect binary baseline data to occur in a manner independent of previous values (i.e., to occur randomly). The runs test examines whether baseline data occurs in a random manner based on row order. Statistically significant (p < 0.05) chi-squared tests may be indicative of an integrity issue. Note: if row order is not sorted chronologically by randomisation date and time this test may be invalid.

item_2_1 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "2.1", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_2_1, row.names = FALSE)

Item description	Status	Details
Consecutive Baseline Binary	Pass	No significant differences using χ² test.

if(!is.null(result[["detail_tables"]][["2.1"]])) knitr::kable(result[["detail_tables"]][["2.1"]], row.names = FALSE)

Variable	TotalAdjacentPairs	ObservedConsecutivePairs	ObservedNonConsecutivePairs	ExpectedConsecutivePairs	ExpectedNonConsecutivePairs	PValue	Significant
sex	119	54	65	60	59	0.5165	FALSE

Item 2.2: Excessive imbalances between groups in continuous baseline variables.

Evaluates mean and standard deviation for key prognostic factors that are continuous, split by treatment group.

item_2_2 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "2.2", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_2_2, row.names = FALSE)

Item description	Status	Details
Excessive Imbalances (Numeric)	Pass	No significant differences between groups.

if(!is.null(result[["detail_tables"]][["2.2"]])) knitr::kable(result[["detail_tables"]][["2.2"]], row.names = FALSE)

Variable	Group 1 MeanSD	Group 2 MeanSD	PValue
mat_age	29 (7)	30 (7)	0.2887585
GA_weeks	NA	NA	0.2230359
28	3 (6.0%)	6 (8.6%)	NA
29	3 (6.0%)	3 (4.3%)	NA
30	3 (6.0%)	10 (14%)	NA
31	6 (12%)	10 (14%)	NA
32	19 (38%)	13 (19%)	NA
33	16 (32%)	28 (40%)	NA
birthweight	1,835 (421)	1,757 (361)	0.3928871

Item 2.3: Excessive imbalances in baseline categorical variables between groups.

This item assesses whether counts of baseline categorical variables are significantly different (p<0.05) between groups.

item_2_3 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "2.3", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_2_3, row.names = FALSE)

Item description	Status	Details
Excessive Imbalances (Categorical)	Pass	No significant differences between groups.

if(!is.null(result[["detail_tables"]][["2.3"]])) knitr::kable(result[["detail_tables"]][["2.3"]], row.names = FALSE)

VariableOrLevel	Group 1	Group 2	PValue
sex	NA	NA	0.6655309
1	27 (54%)	35 (50%)	NA
2	23 (46%)	35 (50%)	NA

Item 2.4: Significant difference in variance of continuous baseline variables between groups.

This item uses Levene’s test, which checks whether there is a significant difference in variability between groups.

item_2_4 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "2.4", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_2_4, row.names = FALSE)

Item description	Status	Details
Differential Variability	Pass	No significant differences using Levene test.

if(!is.null(result[["detail_tables"]][["2.4"]])) knitr::kable(result[["detail_tables"]][["2.4"]], row.names = FALSE)

Variable	DF1	DF2	FStatistic	PValue	Significant
mat_age	1	118	0.05673	0.8122	FALSE
GA_weeks	1	118	2.45300	0.1200	FALSE
birthweight	1	118	0.34280	0.5593	FALSE

Domain 3: Correlations

Item 3.1: No association between variables known to be highly correlated.

This item plots the correlation between selected continuous variables and calculates the Pearson correlation coefficient (R) and associated p value. Check whether expected correlations are present.

item_3_1 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "3.1", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_3_1, row.names = FALSE)

Item description	Status	Details
Unexpectedly Uncorrelated	Potential integrity issue	GA_weeks, birthweight

correlation_plots <- setdiff(names(result[["images"]]), c("Terminal Digits", "Cumulative Allocation", "Days"))
for(plot_name in correlation_plots) {
  print(result[["images"]][[plot_name]])
}

Domain 4: Date violations

Item 4.1: Individual enrolment dates do not fit within study start and end dates.

This item examines whether randomisation dates for each individual fall within the enrolment period.

item_4_1 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "4.1", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_4_1, row.names = FALSE)

Item description	Status	Details
Implausible Randomisation Date	Potential integrity issue	Participants 38, 49

item_4_1_dates <- result[["detail_tables"]][["4.1"]]
if(!is.null(item_4_1_dates)) {
  knitr::kable(item_4_1_dates, row.names = FALSE)
}

Study Start Date	Minimum Randomisation Date	Study End Date	Maximum Randomisation Date
2019-03-02	2019-03-02	2020-08-09	2020-08-20

Item 4.2: Dates (or visits) are not in logical order.

Requires study-specific repeated visits or event-date variables; for example, a follow-up date occurring before enrollment.

item_4_2 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "4.2", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_4_2, row.names = FALSE)

Item description	Status	Details
Logical Date Order	Skipped	This item needs to be checked mannually. Study-specific repeated visit or event-date variables are required.

Domain 5: Patterns of allocation

Item 5.1: Non-random allocation patterns: plot.

The plot below shows the cumulative number of allocated participants to each treatment arm by date of randomisation. We expect the cumulative number of randomised participants in each group to be similar if 1:1 allocation is used. Assess whether cumulative lines for treatment groups deviate from each other drastically. Note: the graphs will only appear when the date of randomisation is provided.

item_5_1 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "5.1", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_5_1, row.names = FALSE)

Item description	Status	Details
Cumulative Allocation	Displayed	Cumulative allocation plot generated.

if("Cumulative Allocation" %in% names(result[["images"]])) {
  result[["images"]][["Cumulative Allocation"]]
}

Item 5.2: Non-random allocation patterns: statistical test

This item evaluates randomness of allocation using two approaches: a runs test and a chi-squared test comparing observed adjacent intervention runs with the expected number under random allocation. A statistically significant result (p<0.05) from either test may be indicative of an issue with randomisation.

item_5_2 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "5.2", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_5_2, row.names = FALSE)

Item description	Status	Details
Allocation Pattern	Pass	Intervention treatment_cat has no statistically significant result using the adjacent-pairs chi-squared test or runs test. Allocation order was evaluated after sorting by randomisation date.

item_5_2_test <- result[["detail_tables"]][["5.2"]]
if(!is.null(item_5_2_test)) {
  knitr::kable(item_5_2_test, row.names = FALSE)
}

Test	Variable	Statistic	PValue	Significant	Details	OrderBasis
Adjacent-pairs chi-squared test	treatment_cat	NA	1.0000	FALSE	Observed consecutive pairs: 61; expected consecutive pairs: 61; total adjacent pairs: 119	Sorted by randomisation date
Runs test	treatment_cat	-0.06288	0.9499	FALSE	Observed runs: 59; expected runs: 59.33	Sorted by randomisation date

Item 5.3: Unexpected imbalance in randomisation day of week.

The table below reports two chi-squared tests: one assessing whether randomisation is distributed evenly across weekdays overall, and one assessing whether randomisation day differs by intervention group. The graph below shows the number of participants randomised on each day of the week by group. We expect numbers to be balanced between groups for each weekday, and fewer enrolments on the weekend for non-urgent interventions. Note: the graph will only appear when the date of randomisation is provided.

item_5_3 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "5.3", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_5_3, row.names = FALSE)

Item description	Status	Details
Allocation	Pass	No significant difference of allocations on days using Simulated chi-squared test used because expected counts were sparse (10000 replicates) .

item_5_3_test <- result[["detail_tables"]][["5.3"]]
if(!is.null(item_5_3_test)) {
  knitr::kable(item_5_3_test, row.names = FALSE)
}

Test	Method	Statistic	DF	PValue	Significant
Chi-squared goodness-of-fit test of randomisation day overall	Pearson’s chi-squared test	8.100	6	0.2309	FALSE
Chi-squared test of randomisation day by intervention group	Simulated chi-squared test used because expected counts were sparse (10000 replicates)	9.118	NA	0.1727	FALSE

if("Days" %in% names(result[["images"]])) {
  result[["images"]][["Days"]]
}

Domain 6: Internal inconsistencies

Item 6.1: Inconsistent or illogical values across variables within individual participants.

Derive logic rules for each variable to be collected, e.g. date of hospital discharge = date of admission + days in hospital; if number of transfusions ≥1, then any transfusion = yes. Incorporate these rules into the package so that any breaches are displayed in the output

item_6_1 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "6.1", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_6_1, row.names = FALSE)

Item description	Status	Details
Implausible Values	Pass	No values of mat_age less than 10
Implausible Values	Pass	No values of mat_age greater than 50
Implausible Values	Pass	No values of GA_weeks less than 22
Implausible Values	Pass	No values of GA_weeks greater than 37
Implausible Values	Potential integrity issue	5 randomisation date on Saturday
Implausible Values	Potential integrity issue	8 randomisation date on Saturday
Implausible Values	Potential integrity issue	11 randomisation date on Saturday
Implausible Values	Potential integrity issue	20 randomisation date on Saturday
Implausible Values	Potential integrity issue	21 randomisation date on Sunday
Implausible Values	Potential integrity issue	28 randomisation date on Saturday
Implausible Values	Potential integrity issue	29 randomisation date on Saturday
Implausible Values	Potential integrity issue	34 randomisation date on Sunday
Implausible Values	Potential integrity issue	35 randomisation date on Saturday
Implausible Values	Potential integrity issue	42 randomisation date on Saturday
Implausible Values	Potential integrity issue	45 randomisation date on Sunday
Implausible Values	Potential integrity issue	46 randomisation date on Saturday
Implausible Values	Potential integrity issue	49 randomisation date on Sunday
Implausible Values	Potential integrity issue	54 randomisation date on Saturday
Implausible Values	Potential integrity issue	60 randomisation date on Sunday
Implausible Values	Potential integrity issue	61 randomisation date on Sunday
Implausible Values	Potential integrity issue	63 randomisation date on Saturday
Implausible Values	Potential integrity issue	72 randomisation date on Saturday
Implausible Values	Potential integrity issue	79 randomisation date on Sunday
Implausible Values	Potential integrity issue	83 randomisation date on Saturday
Implausible Values	Potential integrity issue	86 randomisation date on Saturday
Implausible Values	Potential integrity issue	111 randomisation date on Saturday
Implausible Values	Potential integrity issue	114 randomisation date on Sunday
Implausible Values	Potential integrity issue	118 randomisation date on Saturday
Implausible Values	Potential integrity issue	119 randomisation date on Saturday

Domain 7: External inconsistencies

Item 7.1: IPD do not correspond to publications or reports.

The table below shows summary statistics for each variable provided in the IPD dataset, e.g. mean, median, range, etc. Manually cross‐check these against any published trial reports, including appendices and supplements. Record any inconsistencies identified, for example, discrepancies in summary variable values between IPD and publication, inclusion of participants in IPD that do not meet eligibility criteria in publication, published variables that are missing from IPD dataset. If data are provided for excluded participants, check whether reasons for exclusion are consistent with publication.

item_7_1 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "7.1", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_7_1, row.names = FALSE)

Item description	Status	Details
External Consistency	Displayed	Clinical summary table generated for comparison with publications or reports.

if(!is.null(result[["summary_table"]])) {
  result[["summary_table"]]
}

Characteristic	1 N = 50¹	2 N = 70¹
sex
1	27 (54%)	35 (50%)
2	23 (46%)	35 (50%)
mat_age	29 (7)	30 (7)
GA_weeks
28	3 (6.0%)	6 (8.6%)
29	3 (6.0%)	3 (4.3%)
30	3 (6.0%)	10 (14%)
31	6 (12%)	10 (14%)
32	19 (38%)	13 (19%)
33	16 (32%)	28 (40%)
birthweight	1,835 (421)	1,757 (361)
IVH
0	35 (73%)	52 (74%)
1	13 (27%)	18 (26%)
Unknown	2	0
NEC
0	46 (92%)	67 (96%)
1	4 (8.0%)	3 (4.3%)
CLD
0	40 (80%)	45 (64%)
1	1 (2.0%)	16 (23%)
2	9 (18%)	7 (10%)
3	0 (0%)	2 (2.9%)
hospital_days	30 (20)	36 (24)
inf_death
0	49 (98%)	65 (94%)
1	1 (2.0%)	4 (5.8%)
Unknown	0	1
¹ n (%); Mean (SD)

Domain 8: Plausibility of data

Item 8.1: Too few missing data or missing data are overly similar between groups.

The table below shows missingness by intervention group for outcome variables, including the percentage missing in each group.

item_8_1 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "8.1", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_8_1, row.names = FALSE)

Item description	Status	Details
Missing Values by Intervention	Pass	No significant difference of missing values between allocations using χ² test.

if(!is.null(result[["detail_tables"]][["8.1"]])) knitr::kable(result[["detail_tables"]][["8.1"]], row.names = FALSE)

Variable	Missing Count 1	Total 1	Missing Percent 1	Missing Count 2	Total 2	Missing Percent 2	PValue	Significant
IVH	2	50	4	0	70	0.0	0.3349	FALSE
NEC	0	50	0	0	70	0.0	NA	NA
CLD	0	50	0	0	70	0.0	NA	NA
hospital_days	0	50	0	0	70	0.0	NA	NA
inf_death	0	50	0	1	70	1.4	1.0000	FALSE

Item 8.2: Implausible event rates: outcomes and demographics.

The table below shows events and totals for dichotomous baseline variables and dichotomous common and rare outcomes, split by intervention group.

item_8_2 <- result[["check_table"]][result[["check_table"]][["ItemNumber"]] == "8.2", c("Item description", "Status", "Details"), drop = FALSE]
knitr::kable(item_8_2, row.names = FALSE)

Item description	Status	Details
Implausible Event Rates	Displayed	Events and totals table generated for dichotomous baseline and outcome variables by intervention.

if(!is.null(result[["detail_tables"]][["8.2"]])) knitr::kable(result[["detail_tables"]][["8.2"]], row.names = FALSE)

Variable	EventLevel	Events 1	Total 1	Percent 1	Events 2	Total 2	Percent 2
sex	2	23	50	46.0	35	70	50.0
IVH	1	13	48	27.1	18	70	25.7
NEC	1	4	50	8.0	3	70	4.3
inf_death	1	1	50	2.0	4	69	5.8

Computing Environment

This vignette was executed on the following computing system:

sessionInfo()

## R version 4.5.0 (2025-04-11)
## Platform: aarch64-apple-darwin20
## Running under: macOS 26.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Australia/Sydney
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] readxl_1.4.5    integrity_1.0.1 testthat_3.3.2 
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       xfun_0.56          bslib_0.10.0       ggplot2_4.0.2     
##  [5] rstatix_0.7.3      lattice_0.22-9     vctrs_0.7.1        tools_4.5.0       
##  [9] generics_0.1.4     tibble_3.3.1       pkgconfig_2.0.3    Matrix_1.7-3      
## [13] RColorBrewer_1.1-3 S7_0.2.1           desc_1.4.3         gt_1.3.0          
## [17] lifecycle_1.0.5    compiler_4.5.0     farver_2.1.2       stringr_1.6.0     
## [21] brio_1.1.5         janitor_2.2.1      carData_3.0-6      snakecase_0.11.1  
## [25] litedown_0.9       htmltools_0.5.9    sass_0.4.10        yaml_2.3.12       
## [29] Formula_1.2-5      pillar_1.11.1      car_3.1-5          ggpubr_0.6.3      
## [33] jquerylib_0.1.4    tidyr_1.3.2        cachem_1.1.0       abind_1.4-8       
## [37] nlme_3.1-168       commonmark_2.0.0   tidyselect_1.2.1   digest_0.6.39     
## [41] stringi_1.8.7      gtsummary_2.5.0    dplyr_1.2.0        purrr_1.2.1       
## [45] labeling_0.4.3     splines_4.5.0      rprojroot_2.1.1    fastmap_1.2.0     
## [49] grid_4.5.0         cli_3.6.6          magrittr_2.0.4     cards_0.7.1       
## [53] dichromat_2.0-0.1  pkgbuild_1.4.8     broom_1.0.12       withr_3.0.2       
## [57] scales_1.4.0       backports_1.5.0    cardx_0.3.2        lubridate_1.9.5   
## [61] timechange_0.4.0   rmarkdown_2.30     otel_0.2.0         ggsignif_0.6.4    
## [65] cellranger_1.1.0   evaluate_1.0.5     knitr_1.51         markdown_2.0      
## [69] mgcv_1.9-4         rlang_1.2.0        glue_1.8.0         xml2_1.5.2        
## [73] pkgload_1.5.2      rstudioapi_0.18.0  jsonlite_2.0.0     R6_2.6.1          
## [77] fs_2.1.0

Getting Started with `integrity`

Sol Libesman, David Nguyen, Dario Strbenac, Jie Kang, Lene Seidler, Kylie Hunter
The University of Sydney, Australia.

Package overview

How to use the package

Step 1: Data loading

Step 2: Data preparation

Step 3: Running integrity checks

Step 4: Reviewing results by integrity domain

Domain 1: Unusual or repeated data patterns

Domain 2: Baseline characteristics

Domain 3: Correlations

Domain 4: Date violations

Domain 5: Patterns of allocation

Domain 6: Internal inconsistencies

Domain 7: External inconsistencies

Domain 8: Plausibility of data

Computing Environment

Getting Started with integrity

Sol Libesman, David Nguyen, Dario Strbenac, Jie Kang, Lene Seidler, Kylie Hunter The University of Sydney, Australia.

Package overview

How to use the package

Step 1: Data loading

Step 2: Data preparation

Step 3: Running integrity checks

Step 4: Reviewing results by integrity domain

Domain 1: Unusual or repeated data patterns

Domain 2: Baseline characteristics

Domain 3: Correlations

Domain 4: Date violations

Domain 5: Patterns of allocation

Domain 6: Internal inconsistencies

Domain 7: External inconsistencies

Domain 8: Plausibility of data

Computing Environment

Getting Started with `integrity`

Sol Libesman, David Nguyen, Dario Strbenac, Jie Kang, Lene Seidler, Kylie Hunter
The University of Sydney, Australia.