MAIHDA: Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy

CRAN status CRAN downloads R-CMD-check R License: MIT Codecov test coverage Lifecycle: stable

Overview

The MAIHDA package provides a comprehensive toolkit for conducting Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (MAIHDA). This approach is particularly valuable for examining intersectional inequalities in health and social outcomes by considering the joint effects of multiple social categories (e.g., gender, race, socioeconomic status).

Key Features

Installation

You can install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("hdbt/MAIHDA")

or the current stable version from CRAN:

install.packages("MAIHDA")

Quick Start: 30 Seconds to MAIHDA

library(MAIHDA)
data("maihda_health_data")

# Everything in one call: null + adjusted fit, VPC/ICC summary, and PCV decomposition.
analysis <- maihda(BMI ~ Age + Gender + Race + Education + (1 | Gender:Race:Education),
                   data = maihda_health_data)
analysis
summary(analysis)

plot(analysis, type = "vpc")
plot(analysis, type = "effect_decomp")
plot(analysis) # plots all plot types

Main Functions

maihda()

A single high-level entry point that runs the standard two-model MAIHDA workflow: it fits the null model (covariates only) and the adjusted model (plus the dimensions’ additive main effects – write them in the formula, or let maihda() add them with a message), summarises the null-model VPC/ICC, and reports the PCV (the additive share of the intersectional inequality). When a group is supplied it also runs this decomposition within each group. Alternatively, decomposition = "crossed-dimensions" reads the additive/interaction split off a single model that enters each dimension’s main effect as a random intercept. Returns one maihda_analysis object with print(), summary(), and plot() methods (plot() routes the VPC/shrinkage views to the null model and the additive-vs-intersectional views to the adjusted model). It is intrinsically a decomposition and has no single-model mode – use fit_maihda() for a single fit.

maihda_table()

Assembles the two standard MAIHDA write-up deliverables from a fitted maihda() analysis (or a single fit_maihda() model) in one call: (a) a model-results table contrasting the null (Model 1) and adjusted (Model 2) fits — intercept, between-stratum variance and SD, VPC/ICC, the PCV, and (for a binary outcome) the AUC and Median Odds Ratio — and (b) a ranked-strata table ordering every stratum by its predicted outcome, with conditional intervals and the stratum random effect. The $models data frame is numeric and export-ready (write.csv() / knitr::kable()); print() renders the familiar “estimate [low, high]” layout plus the top/bottom strata. It adapts to every fit type (crossed-dimensions shares, contextual Context share (VPC), ordinal thresholds) and engine (lme4/brms/WeMix/ordinal).

analysis <- maihda(BMI ~ Age + Gender + Race + (1 | Gender:Race), data = maihda_health_data)
tab <- maihda_table(analysis)
tab          # printed: Model 1 vs Model 2 table + highest/lowest strata
tab$models   # numeric, export-ready results table
tab$strata   # all strata ranked by predicted BMI

make_strata()

Creates intersectional strata from multiple categorical variables with optional minimum count filtering.

fit_maihda()

Fits multilevel models using the lme4 (default), brms, WeMix (design-weighted, via sampling_weights), or ordinal (ordinal::clmm, selected automatically for ordered-factor outcomes) engine. Supports various families including gaussian, binomial, poisson, negbinomial (overdispersed counts; theta estimated via lme4::glmer.nb() or brms’s shape parameter, with the VPC using the latent-scale level-1 variance of Nakagawa, Johnson & Schielzeth 2017), and ordinal/cumulative (proportional-odds models for ordered outcomes; latent-scale VPC with pi^2/3 logit / 1 probit level-1 variance, response-scale predictions as expected category scores).

Contextual cross-classified MAIHDA (context =)

The MAIHDA literature’s cross-classified design crosses individuals’ intersectional strata with a higher-level context – hospitals (patient survival), schools (student achievement), neighbourhoods. Pass context = "school" to fit_maihda() or maihda() to fit outcome ~ covars + (1 | stratum) + (1 | school) in one model; the summary then splits the unexplained variance into between-stratum (intersectional, net of the context), between-context (the general contextual effect), and residual, and plot(type = "context_vpc") visualises the partition.

data(maihda_country_data)
# Strata (gender x SES) crossed with country in ONE model -- contrast with
# group = "country", which instead fits an independent model per country.
a <- maihda(math ~ 1 + (1 | gender:ses), data = maihda_country_data,
            context = "country")
a$summary$context$vpc_context_total  # the country share of unexplained variance
plot(a, type = "context_vpc")

A context with few levels (like these 6 countries) weakly identifies its variance; prefer many-level contexts or engine = "brms" for serious use.

summary()

Provides comprehensive model summaries including:

predict_maihda()

Makes predictions at two levels:

plot()

Creates various visualizations:

maihda_ternary_plot()

Generates a ternary diagnostic plot. For each stratum it normalizes three magnitudes to sum to 1: the additive signal (how far the fixed-effect-only prediction sits from the grand mean), the intersection-specific signal (the magnitude of the stratum random effect), and the uncertainty in that estimate. It is a relative-signal diagnostic, not a formal variance decomposition.

plot_prediction_deviation_panels()

Creates an advanced, publication-ready two-panel dashboard for visualizing predicted values and highlighting the most notable cases or strata. What counts as notable depends on the model type — the largest deviation from the mean prediction (Gaussian/Poisson), the largest absolute deviance residual (binomial), or the most surprising observation (ordinal) — and the labelled points are not regression-diagnostic outliers.

compare_maihda()

Compares VPC/ICC across multiple models with optional bootstrap confidence intervals, and (by default, ic = TRUE) appends relative-fit information criteria — AIC/BIC for the likelihood engines, WAIC/LOOIC for brms — for comparing model structures.

maihda_ic()

Reports the relative-fit information criteria for one or more models (or a maihda() analysis, expanded into its null and adjusted models) to help choose between model structures — a question the VPC/PCV do not address. AIC/BIC for the likelihood engines (lme4, ordinal) and the Bayesian WAIC/LOOIC for brms, with a delta column from the best model. REML lmer fits are refitted with ML so AIC/BIC are comparable across models with different fixed effects (the null-vs-adjusted case).

compare_maihda_groups()

Compares intersectional inequality (VPC/ICC and between-/within-stratum variance) across the levels of a higher-level grouping variable such as country, region, or survey wave, fitting a stratified MAIHDA model per group. Visualize with plot(result, type = "vpc"). The bundled maihda_country_data (OECD PISA 2018; gender × socioeconomic-status strata across six countries) is built to demonstrate this.

calculate_pvc()

Calculates the proportional change in between-stratum variance (PCV) between two models. It is the share of the baseline model’s between-stratum variance explained by the second model only when the second nests the first (adding predictors on the same outcome, sample and strata); otherwise it is a model-dependent change in variance.

stepwise_pcv()

Evaluates multiple sequential models by iteratively adding covariates step-by-step. Each step’s PCV is the change in between-stratum variance contributed by a predictor given the variables already in the model, so it is order-dependent rather than an order-invariant “unique” contribution. For a binary outcome it also reports the discriminatory-accuracy trajectory (AUC, the step/total change in AUC, and MOR) alongside the PCV, so the strata’s discriminatory accuracy can be tracked as covariates are added.

Longitudinal (growth-curve) MAIHDA

Supply id and time to fit_maihda()/maihda() for a 3-level growth-curve MAIHDA (occasions within individuals within strata, with random intercept and slope on time at both levels), the life-course extension of Bell, Evans, Holman & Leckie (2024). The between-stratum VPC is then time-varying, and maihda(decomposition = "longitudinal") reports the PCV separately for the baseline (intercept) and the slope variance — the additive-vs-multiplicative split of the intersectional trajectory inequality.

data(maihda_long_data)   # 600 persons x 5 waves; gender x ethnicity x education strata

# Time-varying VPC from a 3-level growth model
m <- fit_maihda(wellbeing ~ wave + (1 | gender:ethnicity:education),
                data = maihda_long_data, id = "id", time = "wave")
summary(m)                         # baseline VPC + the VPC trajectory over waves
plot(m, type = "vpc_trajectory")   # VPC(t) curve
plot(m, type = "trajectories")     # predicted per-stratum mean trajectories

# Additive-vs-multiplicative PCV (null vs adjusted growth model)
a <- maihda(wellbeing ~ wave + (1 | gender:ethnicity:education),
            data = maihda_long_data, id = "id", time = "wave",
            decomposition = "longitudinal")
a$pcv                              # PCV_intercept (baseline) and PCV_slope (trajectory)
plot(a, type = "pcv_trajectory")

run_maihda_app()

Launches a locally-hosted, interactive Shiny Dashboard that exposes the core functionalities for data modeling, visualization, and summarization visually.

Using brms for Bayesian Inference

# Requires brms package
model_brms <- fit_maihda(
  BMI ~ Age + (1 | Gender:Race:Education),
  data = maihda_health_data,
  engine = "brms",
  chains = 4,
  iter = 2000
)

summary_brms <- summary(model_brms)

Design-Weighted MAIHDA (Survey Data)

For complex-survey data (NHANES, PISA, …), pass the sampling-weight column via sampling_weights. Survey weights are not lme4 weights= (those are precision weights), so the fit routes through WeMix::mix() – weighted pseudo-maximum-likelihood (Rabe-Hesketh & Skrondal 2006) – and engine = "lme4" with sampling weights is an error rather than a silent misfit.

# One call: design-weighted null + adjusted models and PCV. The engine switches
# to "wemix" automatically (install WeMix from CRAN).
analysis <- maihda(
  outcome ~ age + (1 | gender:race:education),
  data = survey_data,
  sampling_weights = "person_weight"
)
analysis            # design-weighted VPC/ICC and PCV
summary(analysis)   # design-consistent (sandwich) SEs for the fixed effects

# Works across the toolkit: stepwise PCV, group comparison, prediction, plots,
# and the design-weighted AUC for binary outcomes.
stepwise_pcv(strata_data, "outcome", c("gender", "race", "education"),
             sampling_weights = "person_weight")

# Bayesian alternative: weights enter as likelihood weights (pseudo-posterior --
# point estimates are design-consistent; credible intervals are not design-based).
fit_maihda(outcome ~ age + (1 | gender:race:education), data = survey_data,
           engine = "brms", sampling_weights = "person_weight")

Limitations: the wemix engine covers the canonical gaussian(identity) / binomial(logit) MAIHDA with a single (1 | stratum) intercept. crossed random effects (context =, decomposition = "crossed-dimensions") and bootstrap intervals require lme4/brms. A design-based interval would need replicate weights, which is a upcoming future extension.

Stepwise Proportional Change in Variance (PCV)

# Incrementally track the explained variables interactively
stepwise_results <- stepwise_pcv(
  data = data,
  outcome = "health_outcome",
  vars = c("gender", "race", "age")
)
print(stepwise_results)

Tidy output (broom)

tidy() and glance() methods turn a fitted model or analysis into tidy data for tables (gt, flextable) and ggplot2, reading the quantities summary() already computes (no new statistics, no refit):

analysis <- maihda(BMI ~ Age + Gender + Race + Education + (1 | Gender:Race:Education),
                   data = maihda_health_data)

# One-row headline: VPC/ICC, PCV, AUC/MOR (binary), additive/interaction shares, n.
# This row has no equivalent in broom.mixed/easystats -- PCV needs the null+adjusted pair.
glance(analysis)

# Per-estimate tidy frames (broom's term/estimate/std.error/conf.low/conf.high shape):
tidy(analysis)                          # stratum (intersection) effects, with labels
tidy(analysis, component = "variance")  # variance components
tidy(analysis, component = "fixed", which = "adjusted")

# A caterpillar plot in two lines:
library(ggplot2)
tidy(analysis) |>
  ggplot(aes(reorder(label, estimate), estimate, ymin = conf.low, ymax = conf.high)) +
  geom_pointrange() + coord_flip()

Interactive Shiny App

You can access a live, cloud-hosted version of the MAIHDA interactive dashboard directly in your browser without installing R: https://hdbt.shinyapps.io/shiny/

Alternatively, you can run all analyses described above in the browser locally by typing:

library(MAIHDA)
run_maihda_app()

Documentation

For detailed documentation and examples, see the package vignette:

vignette("introduction", package = "MAIHDA")

Dependencies

Required:

Optional:

Bootstrap confidence intervals use a parametric bootstrap via lme4::simulate() / lme4::refit(); no external bootstrap package is required.

References

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this package in your research, please cite:

Bulut (2025). *MAIHDA: Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy.* R package version 0.1.11, https://github.com/hdbt/MAIHDA. doi: 10.32614/CRAN.package.MAIHDA

A BibTeX entry for LaTeX users is:

@Manual{Bulut2025MAIHDA,
  title  = {MAIHDA: Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy},
  author = {Hamid Bulut},
  year   = {2025},
  note   = {R package version 0.1.11},
  url    = {https://github.com/hdbt/MAIHDA},
  doi    = {10.32614/CRAN.package.MAIHDA}
}