Database diagnostics

Introduction

In this example we’re going to be using the Eunomia synthetic data.

library(CDMConnector)
library(OmopSketch)
library(PhenotypeR)
library(dplyr)
library(CohortConstructor)

con <- DBI::dbConnect(duckdb::duckdb(), 
                      CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con, 
                                cdmName = "Eunomia Synpuf",
                                cdmSchema   = "main",
                                writeSchema = "main", 
                                achillesSchema = "main")

cdm$injuries <- conceptCohort(cdm = cdm,
  conceptSet = list(
    "ankle_sprain" = 81151L,
    "ankle_fracture" = 4059173L,
    "forearm_fracture" = 4278672L,
    "hip_fracture" = 4230399L
  ),
  name = "injuries")

Database diagnostics

Although we may have created our study cohort, to inform analytic decisions and interpretation of results requires an understanding of the dataset from which it has been derived. The databaseDiagnostics() function will help us better understand a data source.

To run database diagnostics we just need to provide our cdm reference to the function.

db_diagnostics <- databaseDiagnostics(cdm$injuries)

Database diagnostics builds on OmopSketch package to perform the following analyses:

The output is a summarised result object.

Visualise the results

We can use OmopSketch package functions to visualise the results obtained.

Snapshot

tableOmopSnapshot(db_diagnostics)
Snapshot of the cdm Eunomia Synpuf
Database name
Eunomia Synpuf
Diagnostic
databaseDiagnostics
Estimate
Phenotyper version
0.3.0
General
Snapshot date 2026-01-12
Person count 1,000
Vocabulary version v5.0 06-AUG-21
Cdm
Source name Synpuf
Version 5.3
Holder name ohdsi
Release date 2018-03-15
Description
Documentation reference
Observation period
N 1,000
Start date 2008-01-01
End date 2010-12-31
Cdm source
Type duckdb
Package CDMConnector
Write schema main

Observation Periods

tableObservationPeriod(db_diagnostics)
Summary of observation_period table
CDM name
Eunomia Synpuf
Diagnostic
databaseDiagnostics
Observation period ordinal Variable name Variable level Estimate name
Phenotyper version
0.3.0
all Number records N 1,048
Number subjects N 1,000
Subjects not in person table N (%) 0 (0.00%)
Records per person Mean (SD) 1.05 (0.21)
Median [Q25 - Q75] 1 [1 - 1]
Range [min to max] [1 to 2]
Duration in days Mean (SD) 979.71 (262.79)
Median [Q25 - Q75] 1,096 [1,096 - 1,096]
Range [min to max] [1 to 1,096]
Days to next observation period Mean (SD) 172.17 (108.35)
Median [Q25 - Q75] 138 [93 - 254]
Range [min to max] [32 to 366]
Type concept id Period while enrolled in insurance N (%) 1,048 (100.00%)
Start date before birth date N (%) 0 (0.00%)
End date before start date N (%) 0 (0.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 1 (0.10%)
1st Number subjects N 1,000
Duration in days Mean (SD) 994.16 (257.95)
Median [Q25 - Q75] 1,096 [1,096 - 1,096]
Range [min to max] [1 to 1,096]
Days to next observation period Mean (SD) 172.17 (108.35)
Median [Q25 - Q75] 138 [93 - 254]
Range [min to max] [32 to 366]
2nd Number subjects N 48
Duration in days Mean (SD) 678.60 (164.50)
Median [Q25 - Q75] 730 [730 - 730]
Range [min to max] [31 to 730]
Days to next observation period Mean (SD)
Median [Q25 - Q75]
Range [min to max]

Person Table

tablePerson(db_diagnostics)
Summary of person table
CDM name
Eunomia Synpuf
Diagnostic
databaseDiagnostics
Variable name Variable level Estimate name
Phenotyper version
0.3.0
Number subjects N 1,000
Number subjects not in observation N (%) 0 (0.00%)
Sex Female N (%) 498 (49.80%)
Male N (%) 502 (50.20%)
None N (%) 0 (0.00%)
Sex source Female N (%) 498 (49.80%)
Male N (%) 502 (50.20%)
Race Black or African American N (%) 114 (11.40%)
No matching concept N (%) 67 (6.70%)
White N (%) 819 (81.90%)
Race source Black N (%) 114 (11.40%)
Hispanic N (%) 17 (1.70%)
Others N (%) 50 (5.00%)
White N (%) 819 (81.90%)
Ethnicity Hispanic or Latino N (%) 17 (1.70%)
No matching concept N (%) 50 (5.00%)
Not Hispanic or Latino N (%) 933 (93.30%)
Ethnicity source Hispanic or Latino N (%) 17 (1.70%)
Not Hispanic or Latino N (%) 933 (93.30%)
Others N (%) 50 (5.00%)
Year of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 1,935 [1,928 - 1,942]
90% Range [Q05 to Q95] 1,919 to 1,962
Range [min to max] 1,909 to 1,983
Month of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 7 [3 - 10]
90% Range [Q05 to Q95] 1 to 12
Range [min to max] 1 to 12
Day of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 1 [1 - 1]
90% Range [Q05 to Q95] 1 to 1
Range [min to max] 1 to 1
Location Missing (%) 0 (0.00%)
Zero count (%) 1 (0.10%)
Distinct values 626
Provider Missing (%) 1,000 (100.00%)
Zero count (%) 0 (0.00%)
Distinct values 1
Care site Missing (%) 1,000 (100.00%)
Zero count (%) 0 (0.00%)
Distinct values 1

Clinical Records

tableClinicalRecords(db_diagnostics)
Summary of condition_occurrence table
Database name
Eunomia Synpuf
Diagnostic
databaseDiagnostics
Variable name Variable level Estimate name
Phenotyper version
0.3.0
condition_occurrence
Number records N 160,322
Number subjects N (%) 836 (100.00%)
Subjects not in person table N (%) 0 (0.00%)
Records per person Mean (SD) 191.77 (148.24)
Median [Q25 - Q75] 160 [76 - 278]
Range [min to max] [1 to 765]
In observation Yes N (%) 160,322 (100.00%)
Domain Condition N (%) 159,651 (99.58%)
Observation N (%) 666 (0.42%)
Procedure N (%) 5 (0.00%)
Source vocabulary Icd9cm N (%) 160,322 (100.00%)
Standard concept S N (%) 153,665 (95.85%)
N (%) 6,657 (4.15%)
Type concept id Carrier claim detail - 10th position N (%) 308 (0.19%)
Carrier claim detail - 11th position N (%) 210 (0.13%)
Carrier claim detail - 12th position N (%) 170 (0.11%)
Carrier claim detail - 13th position N (%) 122 (0.08%)
Carrier claim detail - 1st position N (%) 34,738 (21.67%)
Carrier claim detail - 2nd position N (%) 14,405 (8.99%)
Carrier claim detail - 3rd position N (%) 7,484 (4.67%)
Carrier claim detail - 4th position N (%) 4,235 (2.64%)
Carrier claim detail - 5th position N (%) 2,598 (1.62%)
Carrier claim detail - 6th position N (%) 1,595 (0.99%)
Carrier claim detail - 7th position N (%) 876 (0.55%)
Carrier claim detail - 8th position N (%) 599 (0.37%)
Carrier claim detail - 9th position N (%) 420 (0.26%)
Carrier claim header - 1st position N (%) 34,477 (21.50%)
Carrier claim header - 2nd position N (%) 19,670 (12.27%)
Carrier claim header - 3rd position N (%) 11,387 (7.10%)
Carrier claim header - 4th position N (%) 6,436 (4.01%)
Carrier claim header - 5th position N (%) 828 (0.52%)
Carrier claim header - 6th position N (%) 468 (0.29%)
Carrier claim header - 7th position N (%) 238 (0.15%)
Carrier claim header - 8th position N (%) 177 (0.11%)
Inpatient header - 10th position N (%) 31 (0.02%)
Inpatient header - 11th position N (%) 476 (0.30%)
Inpatient header - 1st position N (%) 515 (0.32%)
Inpatient header - 2nd position N (%) 491 (0.31%)
Inpatient header - 3rd position N (%) 490 (0.31%)
Inpatient header - 4th position N (%) 466 (0.29%)
Inpatient header - 5th position N (%) 450 (0.28%)
Inpatient header - 6th position N (%) 424 (0.26%)
Inpatient header - 7th position N (%) 397 (0.25%)
Inpatient header - 8th position N (%) 371 (0.23%)
Inpatient header - 9th position N (%) 335 (0.21%)
Outpatient header - 10th position N (%) 8 (0.00%)
Outpatient header - 11th position N (%) 1,225 (0.76%)
Outpatient header - 1st position N (%) 4,977 (3.10%)
Outpatient header - 2nd position N (%) 3,161 (1.97%)
Outpatient header - 3rd position N (%) 2,004 (1.25%)
Outpatient header - 4th position N (%) 1,202 (0.75%)
Outpatient header - 5th position N (%) 715 (0.45%)
Outpatient header - 6th position N (%) 460 (0.29%)
Outpatient header - 7th position N (%) 315 (0.20%)
Outpatient header - 8th position N (%) 213 (0.13%)
Outpatient header - 9th position N (%) 155 (0.10%)
Start date before birth date N (%) 0 (0.00%)
End date before start date N (%) 0 (0.00%)
Column name Condition concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Condition end date N missing data (%) 0 (0.00%)
Condition end datetime N missing data (%) 160,322 (100.00%)
Condition occurrence id N missing data (%) 0 (0.00%)
N zeros (%) 1 (0.00%)
Condition source concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Condition source value N missing data (%) 0 (0.00%)
Condition start date N missing data (%) 0 (0.00%)
Condition start datetime N missing data (%) 160,322 (100.00%)
Condition status concept id N missing data (%) 160,322 (100.00%)
N zeros (%) 0 (0.00%)
Condition status source value N missing data (%) 160,322 (100.00%)
Condition type concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 191 (0.12%)
Provider id N missing data (%) 0 (0.00%)
N zeros (%) 73,682 (45.96%)
Stop reason N missing data (%) 160,322 (100.00%)
Visit detail id N missing data (%) 160,322 (100.00%)
N zeros (%) 0 (0.00%)
Visit occurrence id N missing data (%) 0 (0.00%)
N zeros (%) 10 (0.01%)