Writing a New Check

Background

Clinical trial datasets can contain a million different types of incorrect data. This package does not intend to comprehensively cover all scenarios in which data may be wrong. Nor does this package intend to replicate the comprehensive set of P21 data checks for SDTM. Instead, the data checks in this package are intended to be generalizable, actionable, and meaningful for analysis. For example many clinical trials contain the CO domain, however the sdtmchecks package does not have any functionality around this domain as it is usually not meaningful for analysis.

Working Collaboratively

GitHub

The main branch (pharmaverse/sdtmchecks@main) contains the latest released version and should not be used for development.

The devel branch is the default branch and contains the latest development version of the package. To start contributing, please make a feature branch off of devel. To install, please refer to the front page of the package site. When your code is ready to be incorporated please open a pull request that another person will review prior to merging the update into devel. If you do not have write access to the repository, please work off of a forked repo and open a pull request from the fork.

Package Dependencies

The {renv} package is used to handle package dependencies. Run renv::restore() to install the same set of package versions being used by the team.

Existing Checks

The sdtmchecksmeta dataset lists existing checks and contains helpful additional information

#Just type this in
sdtmchecksmeta
## # A tibble: 10 × 4
##    check                              domains title                  description
##    <chr>                              <chr>   <chr>                  <chr>      
##  1 check_ae_aeacn_ds_disctx_covid     ae, ds  COVID AE trt discon    "Check pat…
##  2 check_ae_aeacnoth                  ae      AE AEACNOTH multiple   "Check for…
##  3 check_ae_aeacnoth_ds_disctx        ae, ds  AE AEACNOTx Discon     "Check for…
##  4 check_ae_aeacnoth_ds_stddisc_covid ae, ds  COVID AE study discon  "Check pat…
##  5 check_ae_aedecod                   ae      AE Missing PT          "Check for…
##  6 check_ae_aedthdtc_aesdth           ae      AE Death Date vs Indi… "Check for…
##  7 check_ae_aedthdtc_ds_death         ae, ds  DS Death Dates in AE   "Check pat…
##  8 check_ae_aelat                     ae      AE AELAT Missing       "OPHTHALMO…
##  9 check_ae_aeout                     ae      AE Death Outcome       "Check for…
## 10 check_ae_aeout_aeendtc_aedthdtc    ae      Fatal AE Resolution D… "Check for…

Good Practices

Example Check

If you are writing your first check it might be helpful to start by editing an existing one, for example the one below:

#' Example check
#'
#' @param DM 
#'
#' @return boolean
#' @export
#'
#' @examples
#'
#' \dontrun{
#'    check_dm_age_missing(DM)
#'   }
#'

check_dm_age_missing <- function(DM){
  ###First check that required variables exist and return a message if they don't
  if(DM %lacks_any% c("USUBJID","AGE")){
      fail(lacks_msg(DM, c("USUBJID","AGE")))
  }else{
    ### Subset DM to only records with missing AGE
    mydf_0 = subset(DM, is_sas_na(DM$AGE), c("USUBJID","AGE"))
    ### Subset DM to only records with AGE<18
    mydf_1 = subset(DM, !is_sas_na(DM$AGE) & DM$AGE<18, c("USUBJID","AGE"))
    ### Subset DM to only records with AGE>90
    mydf_2 = subset(DM, !is_sas_na(DM$AGE) & DM$AGE>=90, c("USUBJID","AGE"))
    ### Combine records with abnormal AGE
    mydf3 = rbind(mydf_0, mydf_1, mydf_2)
    mydf = mydf3[order(mydf3$USUBJID),]
    rownames(mydf)=NULL
    ###Print to report
    ### Return message if no records with missing AGE, AGE<18 or AGE>90
    if(nrow(mydf)==0){
      pass()
      ### Return subset dataframe if there are records with missing AGE, AGE<18 or AGE>90
    }else if(nrow(mydf)>0){
      fail(paste("DM has ",length(unique(mydf$USUBJID)),
                   " patient(s) with suspicious age value(s). ",sep=""),
             mydf)
        }
  }
}

Additional Considerations