hce package intro

Background

The purpose of the package is to simulate and analyze hierarchical composite endpoints (HCEs). The primary analysis method is win odds, but other win statistics (win ratio, net benefit)(Dong et al. 2023) are also implemented, provided there is no censoring. Win odds uses the DeLong-DeLong-Clarke-Pearson fromula (DeLong, DeLong, and Clarke-Pearson 1988) for the variance of the win proportion and is based on the Brunner-Munzel test (Brunner and Munzel 2000).

The power and sample size formulas consider different alternative classes (“shifted”, “ordered”, and “max” for the maximum value of the standard deviation). All formulas are derived from Bamber (1975). By default, Noether’s formula (Noether 1987) is used for shifted distributions. For more information on the power calculations for shifted distributions see also Samvel B. Gasparyan, Kowalewski, et al. (2021), Samvel B. Gasparyan, Kowalewski, and Koch (2022).

For a review of designing HCEs in clinical trials, see Samvel B. Gasparyan et al. (2022), also Samvel B. Gasparyan et al. (2023) and Little et al. (2023). The Basic Data Structure (BDS), which conforms to the Analysis Data Model (ADaM) (CDISC 2009) principles for hierarchical composite endpoints, is detailed in Samvel B. Gasparyan et al. (2024). To visualize HCEs, the maraca plot (Karpefors, Lindholm, and Gasparyan 2023) can be utilized.

The stratified and adjusted win odds are calculated based on the randomization-based covariate adjustment theory developed in Koch et al. (1998) (for a review, see Samvel B. Gasparyan, Folkvaljon, et al. (2021)).

Setup

Load the package hce and check the version

library(hce)
packageVersion("hce")
#> [1] '0.6.5'

For citing the package, run citation("hce") (Samvel B. Gasparyan 2024).

Contents

List the functions and the datasets in the package

ls("package:hce")
#>  [1] "ADET"        "ADLB"        "ADSL"        "COVID19"     "COVID19b"   
#>  [6] "COVID19plus" "HCE1"        "HCE2"        "HCE3"        "HCE4"       
#> [11] "KHCE"        "as_hce"      "calcWINS"    "calcWO"      "hce"        
#> [16] "minWO"       "powerWO"     "propWINS"    "regWO"       "simADHCE"   
#> [21] "simHCE"      "simORD"      "sizeWO"      "sizeWR"      "stratWO"    
#> [26] "summaryWO"

In brief, the package contains the following:

  1. Datasets: use data(package = "hce") for the list of all datasets included in the package.
  • Simulated datasets - HCE1 - HCE4 that contain two treatment groups and analysis values AVAL of a hierarchical composite endpoint.

  • The datasets COVID19, COVID19b, COVID19plus of COVID-19 ordinal scale outcomes (Beigel et al. 2020; Kalil et al. 2021).

  • The datasets ADET (event-time), ADLB (laboratory), ADSL (subject-level baseline characteristics) for kidney events and their timing, kidney-related laboratory measurements of eGFR (estimated glomerular filtration rate), and, based on this, the derived kidney hierarchical composite endpoint dataset KHCE for the same patients (Heerspink et al. 2023).

  1. Functions to create hce objects:
  • hce(), as_hce(), simHCE() (see Samvel B. Gasparyan et al. (2024)).

  • Auxiliary function simADHCE() works similarly to simHCE() but also provides source datasets used to produce the hce object.

  • The function simORD() simulates ordinal outcomes by categorization of beta distributions.

  1. Win odds (win ratio with ties) calculation generic functions for hierarchical composite endpoints:
  • calcWO(), summaryWO() (see Samvel B. Gasparyan, Kowalewski, et al. (2021)).
  1. Win statistics (win odds, win ratio, net benefit, Goodman Kruskal’s gamma) and their confidence intervals calculation function:
  • calcWINS() (see Samvel B. Gasparyan et al. (2023)).

  • Back-calculation of number of wins, losses, and ties given the win odds and win ratio using the function propWINS().

  1. Adjusted win odds calculation for a single, numeric covariate:
  • regWO() (see Samvel B. Gasparyan, Folkvaljon, et al. (2021); Samvel B. Gasparyan et al. (2023)).
  1. Stratified win odds with a possible adjustment after stratification with a single, numeric covariate:
  • stratWO() (see Samvel B. Gasparyan, Folkvaljon, et al. (2021); Samvel B. Gasparyan et al. (2023)).
  1. Power, sample size, and minimum detectable win odds calculation functions:
  • powerWO(), sizeWO(), minWO() provide tools for the win odds power, sample size, and minimum detectable treatment effect calculation for different alternative classes (“shifted”, “ordered”, and “max” for the maximum value of the standard deviation) based on the win odds. All formulas are from Bamber (1975). By default, uses Noether’s formula (Noether 1987) for shifted distributions. For shifted distributions see also Samvel B. Gasparyan, Kowalewski, et al. (2021), Samvel B. Gasparyan, Kowalewski, and Koch (2022).

  • Win ratio sample size calculation formula sizeWR() (Yu and Ganju 2022).

  1. Print and plot methods for hce_results objects, generated by the functions powerWO(), sizeWO(), minWO().

hce Objects

hce() Function

The main objects in the package are called hce objects, which are data frames with a specific structure matching the design of hierarchical composite endpoints (HCE). These are complex endpoints that combine different clinical events into a composite, using a hierarchy to prioritize the clinically most important event for a patient. These endpoints are implemented in clinical trials across different therapeutic areas. See, for example, Samvel B. Gasparyan et al. (2022) for an implementation in a COVID-19 setting and some practical considerations for constructing hierarchical composite endpoints. For the Chronic Kidney Disease (CKD) outcomes see (Little et al. 2023; Heerspink et al. 2023; Kondo et al. 2024).

HCEs are ordinal endpoints that can be thought of as having ‘greater’, ‘less’, or ‘equal’ defined for them, but without a definition of how much greater or less. In this sense, the ordinal outcomes can be represented as numeric vectors as long as numeric operations (e.g., sum or division) are not performed on them.

hce objects can be constructed using the helper function hce(), which has three arguments:

args("hce")
#> function (GROUP = character(), TRTP = character(), AVAL0 = 0, 
#>     ORD = sort(unique(GROUP))) 
#> NULL

We see that the required arguments are GROUP, which specifies the clinically most important event of a patient to be included in the analysis, and TRTP, which specifies the (planned) treatment group of a patient (exactly two treatment groups should be present). Note that:

  • The hce structure assumes that only one event per patient is present for the analysis, meaning that the resulting hce object created by the hce() function is a patient-level dataset. The function hce() does not select the clinically most important event of the patient but requires it to be already done when calling it.

  • The argument TRTP should have exactly two levels.

Consider the following example of ordinal outcomes ‘I’, ‘II’, and ‘III’:

set.seed(2022)
n <- 100
dat <- hce(GROUP = rep(x = c("I", "II", "III"), each = 100), 
           TRTP = sample(x = c("Active", "Control"), size = n*3, replace = TRUE))
class(dat)
#> [1] "hce"        "data.frame"

This dataset has the appropriate structure of hce objects, but its class inherits from an object of class data.frame. This means that all functions available for data frames can be applied to hce objects, for example, the function head():

head(dat)
#>      TRTP GROUP GROUPN AVAL AVAL0 ORD
#> 1 Control     I      0    0     0   1
#> 2  Active     I      0    0     0   1
#> 3 Control     I      0    0     0   1
#> 4  Active     I      0    0     0   1
#> 5  Active     I      0    0     0   1
#> 6 Control     I      0    0     0   1

We see that the dataset has a very specific structure. The column GROUPN shows how the function hce() generated the order of given events (it uses usual alphabetic order for the unique values in the GROUP column to determine the clinical importance of events):

unique(dat[, c("GROUP", "GROUPN")])
#>     GROUP GROUPN
#> 1       I      0
#> 101    II      1
#> 201   III      2

In the class hce, higher values for the ordering mean clinically less important events. For example, death, which is the most important event, should always get the lowest ordinal value. If there is a need to specify the order of outcomes, then the argument ORD can be used:

set.seed(2022)
n <- 100
dat <- hce(GROUP = rep(x = c("I", "II", "III"), each = 100), 
           TRTP = sample(x = c("A", "P"), size = n*3, replace = TRUE), ORD = c("III", "II", "I"))
unique(dat[, c("GROUP", "GROUPN")])
#>     GROUP GROUPN
#> 1       I      2
#> 101    II      1
#> 201   III      0

This means that the clinically most important event is ‘III’ instead of ‘I’. The argument AVAL0 is meant to help in cases where we want to introduce sub-ordering within each GROUP category. For example, if two events in the group ‘I’ can be compared based on other parameters, then the AVAL0 argument can be specified to take that into account.

Below we use the built-in data frame HCE1 to construct an hce object. Before specifying the order of events, it is a good idea to check what are the unique events included in the GROUP column:

data(HCE1)
unique(HCE1$GROUP)
#> [1] "C"    "TTE4" "TTE1" "TTE2" "TTE3"

Therefore, we can construct the following hce object:

HCE <- hce(GROUP = HCE1$GROUP, TRTP = HCE1$TRTP, AVAL0 = HCE1$AVAL0, 
           ORD = c("TTE1", "TTE2", "TTE3", "TTE4", "C"))
class(HCE)
#> [1] "hce"        "data.frame"
head(HCE)
#>   TRTP GROUP GROUPN     AVAL  AVAL0   ORD
#> 1    A     C  40000 39997.79  -2.21 10000
#> 2    P     C  40000 40017.06  17.06 10000
#> 3    A  TTE4  30000 30966.00 966.00 10000
#> 4    P  TTE4  30000 30352.00 352.00 10000
#> 5    A     C  40000 39987.55 -12.45 10000
#> 6    P     C  40000 40044.62  44.62 10000

Create an hce Object from a Data Frame

Consider the dataset HCE1, which is part of the package hce:

data(HCE1, package = "hce")
class(HCE1)
#> [1] "data.frame"
head(HCE1)
#>   ID TRTP GROUP GROUPN AVALT  AVAL0    AVAL PADY
#> 1  1    A     C   5400  1080  -2.21 5446.32 1080
#> 2  2    P     C   5400  1080  17.06 5465.59 1080
#> 3  3    A  TTE4   4320   966 966.00 5286.00 1080
#> 4  4    P  TTE4   4320   352 352.00 4672.00 1080
#> 5  5    A     C   5400  1080 -12.45 5436.08 1080
#> 6  6    P     C   5400  1080  44.62 5493.15 1080

This dataset has the appropriate structure of hce objects, but its class is data.frame. A generic way of coercing data structures to an hce object is to use the function as_hce(). This function performs checks (using an internal validator function) and creates an hce object from the given data structure (using an internal constructor function). If coercion is not possible, it will throw an error explaining the issue.

dat1 <- as_hce(HCE2)
str(dat1)
#> Classes 'hce' and 'data.frame':  1000 obs. of  8 variables:
#>  $ ID    : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ TRTP  : chr  "A" "P" "A" "P" ...
#>  $ GROUP : chr  "TTE1" "C" "C" "TTE1" ...
#>  $ GROUPN: num  1080 5400 5400 1080 2160 3240 5400 2160 1080 5400 ...
#>  $ AVALT : num  120 1080 1080 577 782 985 1080 293 313 1080 ...
#>  $ AVAL0 : num  120 3.35 22.8 577 782 ...
#>  $ AVAL  : num  1200 5461 5480 1657 2942 ...
#>  $ PADY  : num  1080 1080 1080 1080 1080 1080 1080 1080 1080 1080 ...

Simulate hce Objects Using simHCE()

To simulate values from a hierarchical composite endpoint, we use the function simHCE(), which has the following arguments:

args("simHCE")
#> function (n, n0 = n, TTE_A, TTE_P, CM_A, CM_P, CSD_A = 1, CSD_P = CSD_A, 
#>     fixedfy = 1, yeardays = 360, pat = 100, shape = 1, logC = FALSE, 
#>     seed = NULL, dec = 2) 
#> NULL
  • The vector arguments TTE_A and TTE_P specify the event rates per year for time-to-event outcomes in the active and control groups, respectively. The function assumes a Weibull survival function with the same shape parameter for simulating all time-to-event outcomes in both treatment groups (by default shape = 1, which assumes an exponential survival function). These two vectors should have the same length, which indicates the number of time-to-event outcomes.

  • By default, the event rates are presented per 100 patient-years (pat = 100), which can be changed using the argument pat. The function simulates event times in days, and the yeardays = 360 argument can be used to change the number of days in a year (e.g., 365 or 365.25).

  • The function simulates events during a fixed follow-up period only, and the fixedfy argument can be used to change the length of the follow-up (in years).

  • The function simulates the continuous outcome from a normal (default) or log-normal (if logC = TRUE) distribution with given means and standard deviations for two treatment groups.

Rates_A <- c(1.72, 1.74, 0.58, 1.5, 1) 
Rates_P <- c(2.47, 2.24, 2.9, 4, 6) 
dat3 <- simHCE(n = 2500, n0 = 1500, TTE_A = Rates_A, 
               TTE_P = Rates_P, 
               CM_A = -3, CM_P = -6, 
               CSD_A = 16, CSD_P = 15, 
               fixedfy = 3, seed = 2023)
class(dat3)
#> [1] "hce"        "data.frame"
head(dat3)
#>   ID TRTP GROUP GROUPN AVALT  AVAL0    AVAL seed PADY
#> 1  1    A     C   6480  1080  -9.32 6526.85 2023 1080
#> 2  2    A     C   6480  1080   0.11 6536.28 2023 1080
#> 3  3    A     C   6480  1080  27.06 6563.23 2023 1080
#> 4  4    A     C   6480  1080  -4.78 6531.39 2023 1080
#> 5  5    A  TTE2   2160   390 390.00 2550.00 2023 1080
#> 6  6    A     C   6480  1080   4.56 6540.73 2023 1080

Generics for hce Objects

As we see, the function simHCE() creates an object of type hce, which inherits from the built-in class data.frame. We can check all implemented methods for this new class as follows:

methods(class = "hce")
#> [1] calcWINS  calcWO    summaryWO
#> see '?methods' for accessing help and source code

The function calcWO() calculates the win odds and its confidence interval, while summaryWO() provides a more detailed calculation of win odds, including the number of wins, losses, and ties by GROUP categories.

HCE <- hce(GROUP = HCE3$GROUP, TRTP = HCE3$TRTP,
           ORD = c("TTE1", "TTE2", "TTE3", "TTE4", "C"))
calcWO(HCE)
#>         WO      LCL      UCL         SE WOnull alpha       Pvalue       WP
#> 1 1.333635 1.160574 1.532502 0.07091636      1  0.05 3.852522e-05 0.571484
#>        SE_WP     SD_WP    N
#> 1 0.01736671 0.5491836 1000
summaryWO(HCE)
#> $summary
#>   TRTP    WIN   LOSS   TIE  TOTAL        WR        WO
#> 1    A 110655  74913 64432 250000 1.4771135 1.3336352
#> 2    P  74913 110655 64432 250000 0.6769961 0.7498303
#> 
#> $summary_by_GROUP
#>    TRTP GROUP   WIN  LOSS   TIE  TOTAL
#> 1     A     C 87780     0 45220 133000
#> 2     P     C 39780     0 45220  85000
#> 3     A  TTE1     0 36540  5460  42000
#> 4     P  TTE1     0 27040  5460  32500
#> 5     A  TTE2  3835 18703  6962  29500
#> 6     P  TTE2  9912 42126  6962  59000
#> 7     A  TTE3 10980 14400  4620  30000
#> 8     P  TTE3 11011 22869  4620  38500
#> 9     A  TTE4  8060  5270  2170  15500
#> 10    P  TTE4 14210 18620  2170  35000
#> 
#> $WO
#>         WO         SE       WP      SE_WP
#> 1 1.333635 0.07091636 0.571484 0.01736671

References

Bamber, Donald. 1975. “The Area Above the Ordinal Dominance Graph and the Area Below the Receiver Operating Characteristic Graph.” Journal of Mathematical Psychology 12 (4): 387–415. https://doi.org/10.1016/0022-2496(75)90001-2.
Beigel, John H, Kay M Tomashek, Lori E Dodd, Aneesh K Mehta, Barry S Zingman, Andre C Kalil, Elizabeth Hohmann, et al. 2020. “Remdesivir for the Treatment of Covid-19.” New England Journal of Medicine 383 (19): 1813–26. https://doi.org/10.1056/NEJMoa2007764.
Brunner, Edgar, and Ullrich Munzel. 2000. “The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small-Sample Approximation.” Biometrical Journal: Journal of Mathematical Methods in Biosciences 42 (1): 17–25. https://doi.org/10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U.
CDISC, Clinical Data Interchange Standards Consortium. 2009. “ADaM.” https://www.cdisc.org/standards/foundational/adam.
DeLong, Elizabeth R, David M DeLong, and Daniel L Clarke-Pearson. 1988. “Comparing the Areas Under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics, 837–45. https://doi.org/10.2307/2531595.
Dong, Gaohong, Bo Huang, Johan Verbeeck, Ying Cui, James Song, Margaret Gamalo-Siebers, Duolao Wang, et al. 2023. “Win Statistics (Win Ratio, Win Odds, and Net Benefit) Can Complement One Another to Show the Strength of the Treatment Effect on Time-to-Event Outcomes.” Pharmaceutical Statistics 22 (1): 20–33. https://doi.org/10.1002/pst.2251.
Gasparyan, Samvel B. 2024. hce: Design and Analysis of Hierarchical Composite Endpoints. CRAN: The Comprehensive R Archive Network, R Package, Version 0.6.5. https://CRAN.R-project.org/package=hce.
Gasparyan, Samvel B, Joan Buenconsejo, Elaine K Kowalewski, Jan Oscarsson, Olof F Bengtsson, Russell Esterline, Gary G Koch, Otavio Berwanger, and Mikhail N Kosiborod. 2022. “Design and Analysis of Studies Based on Hierarchical Composite Endpoints: Insights from the DARE-19 Trial.” Therapeutic Innovation & Regulatory Science 56 (5): 785–94. https://doi.org/10.1007/s43441-022-00420-1.
Gasparyan, Samvel B, Folke Folkvaljon, Olof Bengtsson, Joan Buenconsejo, and Gary G Koch. 2021. “Adjusted Win Ratio with Stratification: Calculation Methods and Interpretation.” Statistical Methods in Medical Research 30 (2): 580–611. https://doi.org/10.1177/0962280220942558.
Gasparyan, Samvel B, Elaine K Kowalewski, Joan Buenconsejo, and Gary G Koch. 2023. “Hierarchical Composite Endpoints in COVID-19: The DARE-19 Trial.” In Case Studies in Innovative Clinical Trials, 95–148. Chapman; Hall/CRC. https://doi.org/10.1201/9781003288640-7.
Gasparyan, Samvel B, Elaine K Kowalewski, Folke Folkvaljon, Olof Bengtsson, Joan Buenconsejo, John Adler, and Gary G Koch. 2021. “Power and Sample Size Calculation for the Win Odds Test: Application to an Ordinal Endpoint in COVID-19 Trials.” Journal of Biopharmaceutical Statistics 31 (6): 765–87. https://doi.org/10.1080/10543406.2021.1968893.
Gasparyan, Samvel B, Elaine K Kowalewski, and Gary G Koch. 2022. “Comments on ‘Sample Size Formula for a Win Ratio Endpoint’ by RX Yu and j. Ganju.” Statistics in Medicine 41 (14): 2688–90. https://doi.org/10.1002/sim.9379.
Gasparyan, Samvel B., Nicole Major, Christoffer Bäckberg, Srivathsa Ravikiran, Parag Wani, and Martin Karpefors. 2024. “Basic Data Structure for Hierarchical Composite Endpoints: An Application to Kidney Disease Trials.” Journal of the Society for Clinical Data Management. https://doi.org/10.47912/jscdm.265.
Heerspink, Hiddo L, Niels Jongs, Patrick Schloemer, Dustin J Little, Meike Brinker, Christoph Taso, Martin Karpefors, et al. 2023. “Development and Validation of a New HCE for Clinical Trials of Kidney Disease Progression.” Journal of the American Society of Nephrology 34 (12): 2025–38. https://doi.org/10.1681/ASN.0000000000000243.
Kalil, Andre C, Thomas F Patterson, Aneesh K Mehta, Kay M Tomashek, Cameron R Wolfe, Varduhi Ghazaryan, Vincent C Marconi, et al. 2021. “Baricitinib Plus Remdesivir for Hospitalized Adults with Covid-19.” New England Journal of Medicine 384 (9): 795–807. https://doi.org/10.1056/NEJMoa2031994.
Karpefors, Martin, Daniel Lindholm, and Samvel B Gasparyan. 2023. “The Maraca Plot: A Novel Visualization of Hierarchical Composite Endpoints.” Clinical Trials 20 (1): 84–88. https://doi.org/10.1177/17407745221134949.
Koch, Gary G, Catherine M Tangen, Jin-Whan Jung, and Ingrid A Amara. 1998. “Issues for Covariance Analysis of Dichotomous and Ordered Categorical Data from Randomized Clinical Trials and Non-Parametric Strategies for Addressing Them.” Statistics in Medicine 17 (15-16): 1863–92. https://doi.org/10.1002/(sici)1097-0258(19980815/30)17:15/16%3C1863::aid-sim989%3E3.0.co;2-m.
Kondo, Toru, Pardeep S Jhund, Samvel B Gasparyan, Mingming Yang, Brian L Claggett, Finnian R McCausland, Paolo Tolomeo, et al. 2024. “A Hierarchical Kidney Outcome Using Win Statistics in Patients with Heart Failure from the DAPA-HF and DELIVER Trials.” Nature Medicine, 1–8.
Little, Dustin J, Samvel B Gasparyan, Patrick Schloemer, Niels Jongs, Meike Brinker, Martin Karpefors, Christoph Taso, et al. 2023. “Validity and Utility of a Hierarchical Composite Endpoint for Clinical Trials of Kidney Disease Progression: A Review.” Journal of the American Society of Nephrology 34 (12): 1928–35. https://doi.org/10.1681/ASN.0000000000000244.
Noether, Gottfried E. 1987. “Sample Size Determination for Some Common Nonparametric Tests.” Journal of the American Statistical Association 82 (398): 645–47. https://doi.org/10.1080/01621459.1987.10478478.
Yu, Ron Xiaolong, and Jitendra Ganju. 2022. “Sample Size Formula for a Win Ratio Endpoint.” Statistics in Medicine 41 (6): 950–63. https://doi.org/10.1002/sim.9297.