| Type: | Package |
| Title: | Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy |
| Version: | 0.1.11 |
| Description: | Provides a comprehensive toolkit for conducting Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (MAIHDA). Methods are described in Merlo (2018) <doi:10.1016/j.socscimed.2017.12.026> and Evans et al. (2018) <doi:10.1016/j.socscimed.2017.11.011>. Automatically generates intersectional strata, fits analytical models, extracts statistics, and produces visualizations. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| LazyData: | true |
| Depends: | R (≥ 4.1.0) |
| Imports: | lme4 (≥ 1.1-27), reformulas, ggplot2 (≥ 3.4.0), dplyr (≥ 1.0.0), tidyr (≥ 1.1.0), generics, stats, rlang (≥ 0.4.0), patchwork, ggrepel, tidyselect, tibble |
| Suggests: | shiny, bslib (≥ 0.7.0), DT, future, promises, plotly, haven, shinyjs, shinycssloaders, brms (≥ 2.15.0), WeMix (≥ 4.0.0), ordinal, testthat (≥ 3.0.0), knitr, MASS, rmarkdown, ggtern |
| VignetteBuilder: | knitr |
| URL: | https://github.com/hdbt/MAIHDA, https://hdbt.github.io/MAIHDA/ |
| BugReports: | https://github.com/hdbt/MAIHDA/issues |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-18 06:39:02 UTC; hamid |
| Author: | Hamid Bulut [aut, cre] |
| Maintainer: | Hamid Bulut <me@hamidbulut.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-18 07:30:02 UTC |
MAIHDA: Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy
Description
Provides a comprehensive toolkit for conducting Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (MAIHDA). Methods are described in Merlo (2018) doi:10.1016/j.socscimed.2017.12.026 and Evans et al. (2018) doi:10.1016/j.socscimed.2017.11.011. Automatically generates intersectional strata, fits analytical models, extracts statistics, and produces visualizations.
Author(s)
Maintainer: Hamid Bulut me@hamidbulut.com
See Also
Useful links:
Report bugs at https://github.com/hdbt/MAIHDA/issues
Add Stratum Labels to Estimates
Description
Internal helper function to merge stratum labels into stratum estimates.
Usage
add_stratum_labels(stratum_estimates, strata_info)
Arguments
stratum_estimates |
Data frame with stratum estimates |
strata_info |
Data frame with stratum information including labels |
Value
Data frame with labels merged in
Bootstrap a crossed-dimensions MAIHDA partition (lme4)
Description
Parametric bootstrap (simulate from the fitted model, refit) of the
crossed-dimensions VPC and the additive / interaction shares, returning a
percentile interval for each via maihda_bootstrap_ci. lme4 only – brms
returns posterior credible intervals directly. When ctx_vars names
contextual random intercepts, their variance enters each refit's VPC denominator
and a context_vpc interval (the contexts' total share) is returned too.
Usage
bootstrap_cc(model, cc, n_boot, conf_level, ctx_vars = character(0))
Arguments
model |
The underlying lme4 model object. |
cc |
The |
n_boot |
Number of bootstrap samples. |
conf_level |
Confidence level. |
ctx_vars |
Character vector of contextual grouping factors (may be empty). |
Value
A list with vpc, additive_share, interaction_share
(and context_vpc when ctx_vars is non-empty), each a length-2
interval carrying n_ok/mc_se attributes.
Bootstrap a contextual cross-classified MAIHDA partition (lme4)
Description
Parametric bootstrap (simulate from the fitted model, refit) of the
between-stratum VPC and the contexts' total share for a contextual
cross-classified fit, returning a percentile interval for each via
maihda_bootstrap_ci. lme4 only – brms returns posterior credible
intervals directly.
Usage
bootstrap_context(model, ctx_vars, n_boot, conf_level)
Arguments
model |
The underlying lme4 model object. |
ctx_vars |
Character vector of context grouping factors. |
n_boot |
Number of bootstrap samples. |
conf_level |
Confidence level. |
Value
A list with vpc (between-stratum share) and context_vpc
(contexts' total share), each a length-2 interval carrying
n_ok/mc_se attributes.
Bootstrap PVC
Description
Internal function to compute bootstrap confidence intervals for PVC.
Usage
bootstrap_pvc(model1, model2, n_boot, conf_level)
Arguments
model1 |
First maihda_model object |
model2 |
Second maihda_model object |
n_boot |
Number of bootstrap samples |
conf_level |
Confidence level |
Value
A vector with lower and upper confidence bounds
Bootstrap VPC/ICC
Description
Internal function to compute bootstrap confidence intervals for VPC.
Usage
bootstrap_vpc(model, data, formula, n_boot, conf_level)
Arguments
model |
An lme4 model object |
data |
The data used to fit the model |
formula |
The model formula |
n_boot |
Number of bootstrap samples |
conf_level |
Confidence level |
Value
A vector with lower and upper confidence bounds
Calculate Proportional Change in Between-Stratum Variance (PCV)
Description
Calculates the proportional change in between-stratum variance (PCV) between two MAIHDA models. The PCV measures how much the between-stratum variance changes when moving from one model to another, and is calculated as: PCV = (Var_model1 - Var_model2) / Var_model1. (The function and result object retain the historical "pvc" naming; “PVC” and “PCV” refer to the same quantity.)
Usage
calculate_pvc(
model1,
model2,
bootstrap = FALSE,
n_boot = 1000,
conf_level = 0.95
)
Arguments
model1 |
A maihda_model object from |
model2 |
A maihda_model object from |
bootstrap |
Logical indicating whether to compute bootstrap confidence intervals for the PCV. Default is FALSE. |
n_boot |
Number of bootstrap samples if bootstrap = TRUE. Default is 1000. |
conf_level |
Confidence level for bootstrap intervals. Default is 0.95. |
Details
The PVC is the proportional change in between-stratum variance when moving from model1 to model2: a positive value means model2 has lower between-stratum variance, a negative value means higher. It is the share of model1's between-stratum variance explained by model2 only in the canonical nested case, where model2 adds fixed-effect predictors to model1 on the same outcome, analytic sample and strata. The function does not require nesting, so for non-nested models the PVC is simply a model-dependent difference in variance, not an explained proportion.
REML vs ML. lmer fits Gaussian models by REML, whose
between-stratum variance estimate is not comparable across models with
different fixed effects – exactly the canonical null-vs-adjusted PCV, where the
adjusted model adds the dimensions' main effects. calculate_pvc() therefore
refits any REML lmer model with maximum likelihood
(refitML) before reading the variances (and before the
parametric bootstrap, so the interval matches), matching maihda_ic
and anova() on lme4 models. Using REML estimates here biases the PCV
(it overstates the residual between-stratum variance of the adjusted model). GLMM
fits (glmer) and the brms/wemix/ordinal engines are already on the
maximum-likelihood scale and are unaffected; single-model VPC/ICC summaries keep
their REML fit, since that comparison-free quantity is not subject to the pitfall.
When bootstrap = TRUE, the function uses a parametric bootstrap: it simulates
new responses from model2 and refits both models with lme4::refit() for
each simulated response to obtain confidence intervals for the PVC estimate.
For negative-binomial models (glmer.nb) refit() holds the
dispersion parameter theta fixed at its original estimate, so the interval is
conditional on the estimated theta.
Value
A list containing:
pvc |
The estimated proportional change in variance |
var_model1 |
Between-stratum variance from model1 |
var_model2 |
Between-stratum variance from model2 |
ci_lower |
Lower bound of confidence interval (if bootstrap = TRUE) |
ci_upper |
Upper bound of confidence interval (if bootstrap = TRUE) |
bootstrap |
Logical indicating if bootstrap was used |
Examples
# Create strata and fit two models
strata_result <- make_strata(maihda_sim_data, c("gender", "race"))
model1 <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata_result$data)
model2 <- fit_maihda(health_outcome ~ age + gender + (1 | stratum), data = strata_result$data)
# Calculate PVC without bootstrap
pvc_result <- calculate_pvc(model1, model2)
print(pvc_result$pvc)
# Calculate PVC with bootstrap CI
# pvc_boot <- calculate_pvc(model1, model2, bootstrap = TRUE, n_boot = 500)
# print(pvc_boot)
Compare MAIHDA Models
Description
Compares variance partition coefficients (VPC/ICC) across multiple MAIHDA models, with optional bootstrap confidence intervals.
Usage
compare_maihda(
...,
model_names = NULL,
bootstrap = FALSE,
n_boot = 1000,
conf_level = 0.95,
ic = TRUE
)
Arguments
... |
Multiple maihda_model objects to compare. |
model_names |
Optional character vector of names for the models. |
bootstrap |
Logical; for lme4 models, compute parametric-bootstrap
VPC confidence intervals. Default FALSE. It does not apply to brms
models, which always return a posterior credible interval (so passing
|
n_boot |
Number of bootstrap samples if bootstrap = TRUE. Default is 1000. |
conf_level |
Confidence level for the VPC interval (lme4 bootstrap CI or brms credible interval). Default is 0.95. |
ic |
Logical; append relative-fit information criteria to the table for
comparing model structures: |
Details
VPCs are only directly comparable when the models share an outcome,
family/link, analytic sample, and strata – the canonical use is nested models
(e.g. null vs covariate-adjusted) on the same data and strata, to show
how the VPC attenuates. If the supplied models differ in any of these,
compare_maihda() still returns the table but issues a single warning,
because the VPCs are then not directly comparable. The same comparability caveat
applies to the appended information criteria (see maihda_ic).
Value
A maihda_comparison data frame of VPC/ICC by model. Interval
columns (ci_lower/ci_upper) are included when any model supplies
an interval – an lme4 bootstrap CI or a brms posterior credible interval. When
ic = TRUE, information-criteria columns (AIC/BIC or
WAIC/LOOIC, whichever apply) are appended.
Examples
# Canonical use: nested models on the SAME data and strata (null vs adjusted)
strata <- make_strata(maihda_sim_data, vars = c("gender", "race"))
null_model <- fit_maihda(health_outcome ~ 1 + (1 | stratum), data = strata$data)
adj_model <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata$data)
# Compare without bootstrap
comparison <- compare_maihda(null_model, adj_model,
model_names = c("Null", "Adjusted"))
# Compare with bootstrap CI
comparison_boot <- compare_maihda(null_model, adj_model,
model_names = c("Null", "Adjusted"),
bootstrap = TRUE, n_boot = 500)
Compare MAIHDA Metrics Across Levels of a Grouping Variable
Description
Fits a separate random-intercept MAIHDA model (intercept-only random
effects; any fixed-effect covariates in formula are still used) within
each level of a higher-level grouping variable (for example country, region, or
survey wave) and reports how the variance partition coefficient (VPC/ICC) and
the between-/within-stratum variance components differ across those groups. When
the strata are defined by at least two dimensions it also fits the adjusted model
(the dimensions' additive main effects) within each group and reports the per-group
pcv – the proportional change in between-stratum variance, i.e. the additive
share of that group's intersectional inequality.
Usage
compare_maihda_groups(
formula,
data,
group,
engine = "lme4",
family = "gaussian",
shared_strata = TRUE,
min_group_n = 30,
bootstrap = FALSE,
n_boot = 1000,
conf_level = 0.95,
autobin = TRUE,
decomposition = c("two-model", "crossed-dimensions"),
sampling_weights = NULL,
...
)
Arguments
formula |
A model formula. Either the shorthand intersectional form
|
data |
A data frame containing the variables in |
group |
Character string naming the grouping variable in |
engine |
Modeling engine, "lme4" (default), "brms", or "wemix" (the
design-weighted fit; requires |
family |
Model family. Default "gaussian". As in |
shared_strata |
Logical. When TRUE (default) intersectional strata are defined once on the full data so that a stratum denotes the same combination in every group; this makes the stratum definitions comparable across groups. Note that a group may still not contain every stratum, so two groups' VPCs can be estimated over different sets of populated strata – they are then not strictly directly comparable, and the function warns when this happens. When FALSE, strata are rebuilt independently within each group (stratum identities are then not comparable across groups at all). |
min_group_n |
Minimum size of the analytic sample a group must have – the rows that survive the model frame (covariate transformations applied, rows with a missing outcome/covariate dropped) – to be modelled. Groups with a smaller usable sample are skipped with a warning, even if they have more raw rows. Default 30. |
bootstrap |
Logical; compute per-group parametric-bootstrap VPC confidence intervals. lme4 engine only. Default FALSE. |
n_boot |
Number of bootstrap samples when |
conf_level |
Confidence level for bootstrap intervals. Default 0.95. |
autobin |
Logical passed to |
decomposition |
Per-group additive-vs-interaction decomposition: the two-model
null -> adjusted PCV ( |
sampling_weights |
Optional name of a sampling-weight column in
|
... |
Additional arguments passed to |
Details
It estimates one VPC per group as a stratified analysis: each group is modelled independently. It is not a cross-classified model and does not adjust the strata for the grouping variable.
The VPC is the share of the unexplained variance that lies between strata,
not the absolute magnitude of intersectional inequality. Because it is a ratio,
a group's VPC can differ from another's because the between-stratum variance
differs, because the within-stratum (residual) variance differs, or both – two
groups with the same between-stratum variance can have very different VPCs. To
compare the absolute amount of between-stratum (intersectional) variation across
groups, read the returned var_between column alongside the VPC rather than
treating a higher VPC as "more inequality".
It is descriptive: it reports each group's VPC (with an interval when available – an lme4 bootstrap CI or a brms credible interval) for side-by-side comparison, but does not test whether the VPCs differ between groups. The per-group intervals describe each group's own uncertainty; whether two intervals overlap is not a valid test of the difference between their VPCs, which would require modelling that difference directly.
Robustness: a group whose analytic sample (rows surviving the model
frame) has fewer than min_group_n observations is always skipped with a
warning. A group with fewer than two populated strata is also skipped
(VPC is undefined with a single stratum) when the stratum membership is known
before fitting – that is, when shared_strata = TRUE or data
already carries a stratum column. Under shared_strata = FALSE
strata are rebuilt inside each group, so a degenerate single-stratum group is
instead reported with a "fit failed" status rather than a pre-fit skip. A
singular fit yields a VPC of 0 rather than an error (unlike
calculate_pvc). A hard fit failure in one group records NA
and a status note without aborting the whole comparison.
Fit-quality diagnostics: for the lme4 engine, groups whose model is
singular or fails to converge keep a status of "ok" (the fit did
complete) but are named in a single aggregated warning, because their VPC/ICC
may be unreliable – a singular fit usually pins the between-stratum variance at
the boundary, giving a VPC of 0.
Value
A data.frame of class maihda_group_comparison with one
row per group and columns group, n, n_strata,
vpc, var_between, var_other, var_residual,
status (and ci_lower/ci_upper when
bootstrap = TRUE). When the strata are defined by at least two
dimensions, two further columns report the per-group null -> adjusted
decomposition: pcv (proportional change in between-stratum variance when
the dimensions' additive main effects are added; computed on the
maximum-likelihood scale – see calculate_pvc – because REML
variances are not comparable across the null vs. adjusted fixed effects) and
var_between_adjusted (the adjusted model's between-stratum variance,
reported as var_between * (1 - pcv) so it shares the scale of the
REML var_between/vpc and the table stays internally coherent);
both are NA for a group whose adjusted fit failed, and the columns are
omitted entirely when the strata have a single dimension. n is the analytic sample size used by the
model (after dropping rows with a missing outcome/covariate) for both fitted
and skipped groups, falling back to the raw row count only when the model
frame cannot be built. var_other is the variance of any additional
random effects and is 0 for the canonical single-stratum model. Groups that
were skipped or failed have NA metrics and an explanatory
status.
See Also
compare_maihda for comparing different models on the
same data; plot.maihda_group_comparison for visualising the result.
Examples
data(maihda_country_data)
# How does gender x SES inequality in PISA math scores differ across countries?
cmp <- compare_maihda_groups(
math ~ 1 + (1 | gender:ses),
data = maihda_country_data,
group = "country"
)
print(cmp)
plot(cmp, type = "vpc")
Compute Ternary Data for MAIHDA Models
Description
Compute Ternary Data for MAIHDA Models
Usage
compute_maihda_ternary_data(
model,
scale = c("link", "response"),
reference_values = NULL,
uncertainty_method = c("auto", "se", "ci_width", "posterior_sd"),
include_na_strata = FALSE,
verbose = TRUE
)
Arguments
model |
A fitted MAIHDA model object from 'fit_maihda()'. |
scale |
Character, either "link" or "response". |
reference_values |
List or data.frame of reference values for covariates. |
uncertainty_method |
Character indicating how to extract uncertainty. "auto" uses conditional standard errors for lme4 models and posterior standard deviations for brms models. "ci_width" uses the 95% interval width. |
include_na_strata |
Logical, whether to include strata with missing data. |
verbose |
Logical, whether to print messages. |
Value
A tidy tibble with ternary coordinates.
Extract Between-Stratum Variance
Description
Internal function to extract between-stratum variance from a MAIHDA model.
Usage
extract_between_variance(model)
Arguments
model |
A maihda_model object |
Value
Numeric value of between-stratum variance
Fit MAIHDA Model
Description
Fits a multilevel model for MAIHDA (Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy) using lme4, brms, or – for design-weighted (survey) data – WeMix.
Usage
fit_maihda(
formula,
data,
engine = "lme4",
family = "gaussian",
autobin = TRUE,
context = NULL,
sampling_weights = NULL,
id = NULL,
time = NULL,
time_degree = 1,
interactions = FALSE,
...
)
Arguments
formula |
A formula specifying the model. Can include a random effect
for stratum (e.g., |
data |
A data frame containing the variables in the formula. |
engine |
Character string specifying which engine to use: "lme4"
(default), "brms", "wemix" (design-weighted pseudo-maximum-likelihood via
|
family |
Character string, family object, or family function specifying
the model family. Common options: "gaussian", "binomial", "poisson",
"negbinomial". Default is "gaussian".
|
autobin |
Logical indicating whether numeric variables used only for
automatic strata creation should be binned by |
context |
Optional character vector naming one or more higher-level
context columns in |
sampling_weights |
Optional single character string naming a numeric
column of
Rows with a missing or non-positive sampling weight are dropped with a
warning. Default |
id |
Optional single character string naming a person/unit identifier
column for a longitudinal (growth-curve) MAIHDA on long-format data
(one row per measurement occasion). Supplied together with |
time |
Optional single character string naming a numeric measurement-time
column (e.g. wave 0, 1, 2, ... or age), required for a longitudinal MAIHDA;
see |
time_degree |
Polynomial degree of the growth curve when |
interactions |
Opt-in per-stratum interaction diagnostic
( |
... |
Additional arguments passed to |
Value
A maihda_model object containing:
model |
The fitted model object (lme4, brms, or WeMix) |
engine |
The engine used ("lme4", "brms", or "wemix") |
sampling_weights |
The sampling-weight column name when supplied, NULL otherwise |
formula |
The model formula |
data |
The data used for fitting |
family |
The family used |
strata_info |
The strata information from make_strata() if available, NULL otherwise |
context_vars |
The context variable name(s) when |
interactions |
The |
response_recoding |
For a recoded two-level outcome, a data frame mapping each original level to its 0/1 value and role (reference/event); NULL when no recoding occurred |
diagnostics |
Fit-quality diagnostics (singular fit / convergence) for lme4 models, surfaced by the print and summary methods |
Examples
# Standard approach: manually create strata first
strata_result <- make_strata(maihda_sim_data, vars = c("gender", "race", "education"))
model <- fit_maihda(health_outcome ~ age + (1 | stratum),
data = strata_result$data,
engine = "lme4")
# Simplified approach: specify stratifying variables directly in the grouping structure
# The function internally calls make_strata() to create intersectionals
model2 <- fit_maihda(health_outcome ~ age + (1 | gender:race:education),
data = maihda_sim_data,
engine = "lme4")
# Contextual cross-classified MAIHDA: strata crossed with a higher-level
# context (here country) -- the literature's cross-classified MAIHDA.
data(maihda_country_data)
model3 <- fit_maihda(math ~ 1 + (1 | gender:ses),
data = maihda_country_data,
context = "country")
summary(model3) # between-stratum vs. between-country vs. residual
Run a Complete MAIHDA Analysis
Description
A single high-level entry point that runs the standard two-model MAIHDA workflow and returns one bundled object. It fits the null model (covariates plus the intersectional random intercept, excluding the stratum dimensions' main effects) and the adjusted model (the null plus the additive main effects of the stratum-defining dimensions), summarises the variance partition (VPC/ICC) of the null model, and reports the PCV – the proportional change in between-stratum variance from the null to the adjusted model, i.e. the additive share of the intersectional inequality. When a higher-level grouping variable is supplied it also compares this decomposition across that variable's levels.
Usage
maihda(
formula,
data,
group = NULL,
context = NULL,
engine = "lme4",
family = "gaussian",
decomposition = c("two-model", "crossed-dimensions", "longitudinal"),
autobin = TRUE,
shared_strata = TRUE,
min_group_n = 30,
bootstrap = FALSE,
n_boot = 1000,
conf_level = 0.95,
response_vpc = FALSE,
seed = NULL,
sampling_weights = NULL,
id = NULL,
time = NULL,
time_degree = 1,
interactions = TRUE,
...
)
Arguments
formula |
A model formula, using either the intersectional shorthand
|
data |
A data frame with the model variables (and the |
group |
Optional character string naming a higher-level grouping variable
(e.g. |
context |
Optional character vector naming higher-level context
column(s) in |
engine |
Modeling engine, "lme4" (default), "brms", or "wemix" (the
design-weighted pseudo-maximum-likelihood fit; requires
|
family |
Model family. Default "gaussian". As in |
decomposition |
How to decompose the intersectional inequality into additive
and interaction parts. |
autobin |
Logical passed to |
shared_strata |
Logical, forwarded to |
min_group_n |
Minimum group size for the per-group comparison, forwarded
to |
bootstrap |
Logical; compute parametric-bootstrap VPC confidence intervals (lme4 only) for both the overall summary and the per-group comparison. Default FALSE. |
n_boot |
Number of bootstrap samples when |
conf_level |
Confidence level for bootstrap intervals. Default 0.95. |
response_vpc |
Logical; for a binomial (lme4) outcome, also attach the
response-scale VPC ( |
seed |
Optional integer seed for the response-scale VPC simulation. |
sampling_weights |
Optional name of a sampling-weight column for a
design-weighted MAIHDA on complex-survey data; see
|
id, time, time_degree |
For a longitudinal MAIHDA: the person/unit
identifier column, the numeric measurement-time column, and the growth-curve
polynomial degree (1 = linear). Supplying |
interactions |
Whether to compute the per-stratum interaction diagnostic
( |
... |
Additional arguments passed to |
Details
Binomial companions. For a binary outcome the model summaries also carry
the discriminatory accuracy (AUC / C-statistic and Median Odds Ratio) – the "DA"
in MAIHDA – automatically, so the null model's strata-only AUC sits alongside its
VPC; set response_vpc = TRUE to add the (simulation-based) response-scale
VPC as well. These are read from summary(x) and the attached
summary_adjusted, and the headline print() shows the null-model AUC.
This is a convenience wrapper around fit_maihda,
calculate_pvc, summary.maihda_model and
compare_maihda_groups. It is intrinsically a two-model
decomposition and has no single-model mode – for a single fit (e.g. just the
null-model VPC / discriminatory accuracy), call fit_maihda directly.
The dimensions' additive main effects. You may write them in the formula –
the fully-specified, lme4-native adjusted model
outcome ~ covars + var1 + var2 + (1 | var1:var2) – or omit them. Either way
the null excludes the dimension main effects and the adjusted includes
them: when the formula already lists them it is taken as the adjusted model and the
null is derived by dropping them; when they are missing maihda() adds them to
the adjusted model and emits a message() so the decomposition stays explicit.
The dimensions themselves are read from the random term: the shorthand
(1 | var1:var2) and make_strata both record them, and a numeric
dimension that make_strata() auto-binned enters the adjusted model as its
reconstructed tertile factor (matching the strata), not as a linear term. Because
maihda() is intrinsically a decomposition, it errors (rather than
returning a null-only result) when it cannot build the adjusted model – when the
dimensions cannot be recovered (a hand-built stratum column records none) or
there is only one dimension (no intersection to decompose). Use fit_maihda
for those single-model fits.
Value
An object of class maihda_analysis: a list with
model |
the fitted |
summary |
the model's |
model_adjusted |
the fitted adjusted |
summary_adjusted |
the adjusted model's |
pcv |
the |
decomposition |
the additive/interaction partition (additive and interaction
variances and shares, per-dimension variances; |
groups |
a |
interactions |
the |
mode |
|
context_vars |
the context variable name(s) when |
formula, adjusted_formula, group_var, call |
bookkeeping for printing |
See Also
fit_maihda for the single-model fitter,
compare_maihda_groups for the group comparison, and
summary.maihda_model for the variance summary.
Examples
data(maihda_health_data)
# One call: null + adjusted fit, VPC summary, and PCV decomposition. Writing the
# dimensions' additive main effects (Gender + Race) gives the fully-specified
# adjusted model; maihda() derives the null by dropping them.
a <- maihda(BMI ~ Age + Gender + Race + (1 | Gender:Race), data = maihda_health_data)
a # VPC (null) and PCV (null -> adjusted)
a$pcv # proportional change in between-stratum variance
a$formula # null: BMI ~ Age + (1 | stratum)
a$adjusted_formula # adjusted: null + Gender + Race main effects
# Omitting them is equivalent -- maihda() adds them to the adjusted model and
# emits a message; the null and PCV are identical to the explicit form above.
a0 <- maihda(BMI ~ Age + (1 | Gender:Race), data = maihda_health_data)
plot(a, type = "vpc") # null model
plot(a, type = "effect_decomp")# adjusted model (additive vs intersectional)
# Crossed-dimensions decomposition: one model, the dimensions' main effects entered
# as RANDOM intercepts. The additive and interaction shares of the between-strata
# variance are read directly from the single fit (no null/adjusted pair).
cc <- maihda(BMI ~ Age + (1 | Gender:Race), data = maihda_health_data,
decomposition = "crossed-dimensions")
cc # VPC and additive/interaction shares
cc$decomposition$additive_share # crossed-dimensions analogue of the PCV
cc$formula # BMI ~ Age + (1|Gender) + (1|Race) + (1|stratum)
# Add a higher-level grouping variable to also compare across its levels.
# maihda_country_data has a real country grouping (PISA achievement data):
data(maihda_country_data)
a2 <- maihda(math ~ 1 + (1 | gender:ses), data = maihda_country_data,
group = "country")
a2
plot(a2, type = "group_vpc")
plot(a2, type = "group_pcv")
# Contextual cross-classified MAIHDA: instead of one model per country (group=),
# model the strata CROSSED with country in a single fit. The summary partitions
# the unexplained variance into between-stratum vs. between-country vs. residual,
# and the PCV is computed with country partialled out.
a3 <- maihda(math ~ 1 + (1 | gender:ses), data = maihda_country_data,
context = "country")
a3
a3$summary$context$vpc_context_total # the country (general contextual) share
plot(a3, type = "context_vpc")
Build the adjusted-model formula and data for a MAIHDA decomposition
Description
Given a fitted null model's formula (in (1 | stratum) form) and its stored
strata metadata, returns the adjusted formula – the null formula plus the additive
main effects of the stratum dimensions – and the data carrying any reconstructed
binned factors. Returns NULL when fewer than two dimensions are available,
because there is no intersection to decompose and the single dimension's main effect
would render the stratum random intercept redundant (singular).
Usage
maihda_adjusted_formula(null_formula, strata_vars, autobin_info, data)
Arguments
null_formula |
The null model formula using |
strata_vars |
Character vector of stratum-defining variables. |
autobin_info |
Auto-binning recipe ( |
data |
The null model's data ( |
Value
A list with formula and data, or NULL if fewer than two
dimensions are available.
Reconstruct the adjusted-model main-effect terms for a MAIHDA decomposition
Description
For each stratum-defining variable, returns the model term to add as an additive
fixed main effect in the adjusted model, plus the data augmented with any
reconstructed binned factors. A categorical dimension is used directly; a numeric
dimension that make_strata tertile-binned is reconstructed as the
same binned factor (using the stored breaks/labels), because
make_strata() bins only a temporary copy and leaves the original numeric
column intact – adding the raw numeric column would wrongly enter a linear term
instead of the factor that defines the strata.
Usage
maihda_adjusted_terms(strata_vars, autobin_info, data)
Arguments
strata_vars |
Character vector of stratum-defining variable names. |
autobin_info |
Named list of |
data |
Data frame containing the original stratum-defining columns. |
Value
A list with terms (character vector of RHS term names) and
data (the input augmented with any .maihda_dim_* binned columns).
Area under the ROC curve (C-statistic), rank-based
Description
Computes the AUC / C-statistic as the Mann-Whitney U statistic: the
probability that a randomly chosen case (y == 1) is assigned a higher
predicted value than a randomly chosen non-case (y == 0), with ties
counting as one half. This needs no external package. An AUC of 0.5 is chance;
1 is perfect separation.
Usage
maihda_auc(prob, y)
Arguments
prob |
Numeric vector of predicted probabilities (or any score where larger means more case-like). |
y |
Observed binary outcome as 0/1 numeric or logical, the same length as
|
Value
A single number in [0, 1], or NA_real_ if either class is
absent.
References
Merlo, J., Wagner, P., Ghith, N., & Leckie, G. (2016). An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighbourhoods and health. PLOS ONE, 11(4), e0153778.
Examples
maihda_auc(c(0.1, 0.4, 0.35, 0.8), c(0, 0, 1, 1))
Inject sampling weights into a brms formula
Description
Rewrites y ~ ... as y | weights(w) ~ .... An existing addition
term (e.g. an aggregated-binomial y | trials(n)) is extended with
+ weights(w); a formula that already carries a weights() addition
term is rejected (the two weight specifications would conflict).
Usage
maihda_brms_weights_formula(formula, wcol)
Arguments
formula |
The model formula. |
wcol |
Name of the (normalized) weight column. |
Value
The rewritten formula (same environment).
Crossed-dimensions variance summary (brms)
Description
brms counterpart of maihda_cc_summary_lme4: computes the additive /
interaction partition per posterior draw and returns posterior point estimates with
credible intervals for the VPC and the shares (no bootstrap – the posterior already
supplies the interval). A contextual random intercept
(object$context_info set) enters the per-draw VPC denominator and is
reported as its own component row and context element.
Usage
maihda_cc_summary_brms(object, cc, conf_level, point = c("median", "mean"))
Arguments
object |
A |
cc |
The |
conf_level |
Credible-interval level. |
point |
Posterior point estimate, "median" (default) or "mean". |
Value
A list with variance_components, vpc_result,
decomposition and context (NULL without a context).
Crossed-dimensions variance summary (lme4)
Description
Internal helper for summary.maihda_model when the model is a
crossed-dimensions MAIHDA fit (object$cc_info set). Partitions the crossed
random-effect variances into the additive (sum of the dimension REs) and
interaction (intersection RE) components, builds the variance-components table and
the VPC, and – when bootstrap = TRUE – adds parametric-bootstrap intervals
for the VPC and the additive/interaction shares. When the fit also carries a
contextual random intercept (object$context_info set), the context
variance enters the VPC denominator and is reported as its own component row
and context element.
Usage
maihda_cc_summary_lme4(object, cc, vc, bootstrap, n_boot, conf_level)
Arguments
object |
A |
cc |
The |
vc |
The model's |
bootstrap, n_boot, conf_level |
Bootstrap controls. |
Value
A list with variance_components, vpc_result,
decomposition and context (NULL without a context).
Location linear predictor of a cumulative (clmm) fit
Description
predict.clmm does not exist, so the location part
\eta = x'\beta (+ u) is built directly: the fixed design matrix is
constructed with the training data's factor levels and multiplied by the
location coefficients beta (a clmm has no intercept column –
it is absorbed by the thresholds – so beta's names select the right
columns), and include_re adds each row's stratum conditional mode
(an unseen stratum contributes 0). Everything is on the latent (link) scale;
map through maihda_ordinal_eta_to_score for the response-scale
expected category score.
Usage
maihda_clmm_linpred(object, newdata = NULL, include_re = TRUE)
Arguments
object |
A |
newdata |
Data to predict for; defaults to the analytic data. |
include_re |
Add the stratum random effect (conditional mode)? |
Value
A numeric vector of latent-scale location predictions.
Stratum random-effect table for a cumulative (clmm) fit
Description
Mirrors maihda_stratum_ranef_lme4(): one row per stratum with the
conditional mode, its conditional standard error (from
ordinal::condVar(), which returns conditional variances), and a
95% interval. At a boundary fit (zero between-stratum variance) the
conditional distribution collapses on 0, so the SE is 0.
Usage
maihda_clmm_stratum_ranef(object)
Arguments
object |
A |
Value
A data frame with stratum, stratum_id,
random_effect, se, lower_95, upper_95.
Threshold (cut-point) estimates of a cumulative (clmm) MAIHDA fit
Description
The thresholds \alpha_k take the place of the intercept in a cumulative
model: P(Y \le k) = g^{-1}(\alpha_k - \eta). Standard errors come from
the Hessian-based vcov() (hence Hess = TRUE at fit time) and
degrade to NA when unavailable.
Usage
maihda_clmm_thresholds(object)
Arguments
object |
A |
Value
A data frame with term, estimate, se.
Variance components of a cumulative (clmm) MAIHDA fit
Description
Reads the between-stratum variance from ordinal::VarCorr() and pairs
it with the latent-scale level-1 variance (\pi^2/3 for logit, 1 for
probit), matching the latent treatment of binomial models in the other
engines.
Usage
maihda_clmm_variances(object)
Arguments
object |
A |
Value
A list with stratum and residual variances.
Join non-empty caption lines for a plot
Description
Pastes its arguments into a single newline-separated caption, dropping any that are NULL, NA, or empty. Returns NULL when nothing remains, so a plot with no caption content keeps a clean (caption-free) look.
Usage
maihda_compose_caption(...)
Arguments
... |
Character scalars (or NULL). |
Value
A single newline-joined string, or NULL.
Contextual cross-classified variance summary (brms)
Description
brms counterpart of maihda_context_summary_lme4: computes the
stratum / context / residual partition per posterior draw and returns posterior
point estimates with credible intervals for the between-stratum VPC and the
contexts' total share (no bootstrap – the posterior supplies the interval).
Usage
maihda_context_summary_brms(
object,
ctx,
conf_level,
point = c("median", "mean")
)
Arguments
object |
A |
ctx |
The |
conf_level |
Credible-interval level. |
point |
Posterior point estimate, "median" (default) or "mean". |
Value
A list with variance_components, vpc_result, context.
Contextual cross-classified variance summary (lme4)
Description
Internal helper for summary.maihda_model when the model carries a
contextual random intercept (object$context_info set,
fit_maihda(context = )) without the crossed-dimensions decomposition.
Partitions the unexplained variance into between-stratum vs. between-context
(one share per context variable) vs. residual. The headline VPC stays the
between-stratum share of all unexplained variance – numerically identical to
the generic single-stratum summary, which folds the context into "Other random
effects" – but the context is now named, given its own component row(s), and
returned as a context element.
Usage
maihda_context_summary_lme4(object, ctx, vc, bootstrap, n_boot, conf_level)
Arguments
object |
A |
ctx |
The |
vc |
The model's |
bootstrap, n_boot, conf_level |
Bootstrap controls. |
Value
A list with variance_components, vpc_result, context.
Cross-National Educational Achievement Data for MAIHDA
Description
A cross-national dataset for demonstrating how Multilevel Analysis of
Individual Heterogeneity and Discriminatory Accuracy (MAIHDA) can be used to
compare intersectional inequality across a higher-level grouping
variable (here, country) with compare_maihda_groups and
maihda. Each row is a 15-year-old student; the intersectional
strata are formed by gender and socioeconomic status (ses), and
the outcome is the PISA mathematics score.
Usage
maihda_country_data
Format
A data frame with 3,600 rows (600 students in each of 6 countries) and 7 variables:
- country
Factor; one of Finland, Germany, United Kingdom, Italy, Japan, Mexico. The higher-level grouping variable.
- gender
Factor; student gender (female/male). A stratum dimension.
- ses
Factor; socioeconomic status as global tertiles (Low/Medium/High) of
escs, computed on the pooled sample so a band means the same in every country. A stratum dimension.- escs
Numeric; the PISA index of economic, social and cultural status (the continuous measure underlying
ses).- math
Numeric; PISA mathematics score (first plausible value). The primary outcome.
- reading
Numeric; PISA reading score (first plausible value).
- low_math
Factor; "Yes" if
mathis below 420 (PISA proficiency Level 2 baseline), else "No". A binary outcome for logistic examples.
Details
Intersectional inequality (the between-stratum share of variance, VPC/ICC) in mathematics achievement differs across the six countries, which is what makes the dataset a useful showcase for the group-comparison workflow.
The intersectional strata are gender:ses (2 x 3 = 6 strata). A canonical
MAIHDA "null" model is math ~ 1 + (1 | gender:ses); comparing its VPC
across countries quantifies how much joint gender-by-class inequality in
achievement varies between countries.
Note
This is a teaching/illustration dataset only. It uses a single PISA plausible
value for each score and does not carry the PISA survey weights or
complex sampling design, so results are not survey-representative and
should not be used for substantive cross-national inference. (For your own
survey data, the package supports design-weighted MAIHDA via the
sampling_weights argument of fit_maihda /
maihda.)
Source
Derived from the OECD Programme for International Student Assessment (PISA)
2018 student questionnaire data (OECD (2019), PISA 2018 Database),
accessed and cleaned via the learningtower R package (MIT licensed),
https://CRAN.R-project.org/package=learningtower. A balanced random
subsample of 600 complete-case students per country was taken (seed 2026). The
data preparation script is in data-raw/maihda_country_data.R.
Examples
data(maihda_country_data)
# Compare intersectional (gender x SES) inequality across countries
analysis <- maihda(
math ~ 1 + (1 | gender:ses),
data = maihda_country_data,
group = "country"
)
analysis
plot(analysis, type = "group_vpc")
Build the crossed-dimensions-model formula and data for a MAIHDA decomposition
Description
The crossed-dimensions alternative to the two-model (fixed-effects PCV)
decomposition (the function name keeps the historical "cross_classified"
spelling). Given a null model's formula (in (1 | stratum) form, carrying
only the covariates) and the stratum metadata, returns the single crossed formula –
the covariates plus an additive random intercept for each stratum dimension
plus the intersection (stratum) random intercept – together with the data
carrying any reconstructed binned factors. In the fitted model each dimension's RE
variance is that dimension's additive main-effect variance and the stratum RE
variance is the interaction beyond additive; see maihda.
Usage
maihda_cross_classified_formula(
null_formula,
strata_vars,
autobin_info,
data,
interaction_group = "stratum"
)
Arguments
null_formula |
The null model formula using |
strata_vars |
Character vector of stratum-defining variables. |
autobin_info |
Auto-binning recipe ( |
data |
The null model's data ( |
interaction_group |
Name of the intersection grouping factor (the column whose
random intercept captures the interaction). Default |
Details
Returns NULL when fewer than two dimensions are available (there is no
intersection to decompose). The dimension grouping factor reuses the dimension's own
column for a categorical dimension and the reconstructed .maihda_dim_* tertile
factor for an auto-binned numeric dimension (via maihda_adjusted_terms),
so the additive REs are crossed on exactly the levels that define the strata.
Value
A list with formula, data, dim_groups (a named character
vector mapping each strata_var to its random-effect grouping-factor name) and
interaction_group ("stratum"); or NULL if fewer than two
dimensions are available.
Cumulative (ordinal) family marker for MAIHDA models
Description
Specifies a cumulative (proportional-odds) model for an ordinal outcome in
fit_maihda / maihda, with a choice of link:
maihda_cumulative("logit") (the default, equivalent to
family = "ordinal") or maihda_cumulative("probit"). It plays the
role a stats family object plays for the other families – there is no
cumulative family constructor in stats, and using
brms::cumulative() would require brms for a frequentist fit.
Usage
maihda_cumulative(link = c("logit", "probit"))
Arguments
link |
The cumulative link: |
Value
A family marker list with elements family = "cumulative" and
link.
See Also
Examples
maihda_cumulative()
maihda_cumulative("probit")
Discriminatory accuracy of a binary MAIHDA model
Description
Bundles the individual-level discriminatory-accuracy summaries for a binomial
MAIHDA model: the AUC / C-statistic (how well the model's predicted
probabilities separate cases from non-cases) and the Median Odds Ratio. Applied
to a strata-only (null) model, the AUC is the discriminatory accuracy of the
intersectional strata themselves – Merlo's central quantity; comparing it with
an adjusted model shows whether individual covariates beyond stratum membership
sharpen classification. The AUC is computed for any binomial link; the Median
Odds Ratio is reported only for the logit link and is NA otherwise (e.g.
for a probit fit), since the MOR is an odds-ratio-scale quantity.
Aggregated-binomial fits (an lme4 cbind(success, failure) response) are
supported: the AUC is the count-weighted C-statistic over the implied
individual-level 0/1 data, and n_case / n_control are the total
successes / failures.
Usage
maihda_discriminatory_accuracy(model)
Arguments
model |
A |
Value
An object of class maihda_da: a list with auc, mor,
n_case, n_control, family, link and engine.
mor is NA for a non-logit binomial link, where the AUC is still
reported. For an aggregated-binomial fit n_case / n_control are
the total successes / failures.
References
Merlo, J. (2018). Multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA) within an intersectional framework. Social Science & Medicine, 203, 74-80.
See Also
Examples
## Not run:
# Obese (Yes/No) by intersectional strata of Gender x Race
strata <- make_strata(maihda_health_data, vars = c("Gender", "Race"))
d <- maihda_health_data
d$stratum <- strata$data$stratum
m <- fit_maihda(Obese ~ (1 | stratum), data = d, family = "binomial")
maihda_discriminatory_accuracy(m)
## End(Not run)
Fit a cumulative MAIHDA model via ordinal::clmm
Description
Internal engine call for fit_maihda(engine = "ordinal"). Builds the
analytic sample (complete cases on the model variables) so the stored
data matches the rows clmm fits, then calls ordinal::clmm()
with Hess = TRUE (needed for the threshold standard errors).
Usage
maihda_fit_clmm(formula, data, family, dot_vals)
Arguments
formula |
The resolved model formula (with |
data |
The data (after strata creation / response preparation). |
family |
The cumulative family marker (link "logit" or "probit"). |
dot_vals |
Named list of evaluated |
Value
A list with model (the clmm fit) and data (the
analytic data frame actually fitted).
Fit a design-weighted MAIHDA model via WeMix
Description
Internal engine call for fit_maihda(engine = "wemix"). Builds the
analytic sample (complete cases on the model variables and the weight column,
positive weights only) so the stored data matches the rows WeMix fits,
attaches the constant level-2 weight column (strata are exhaustive population
cells, sampled with certainty), and calls WeMix::mix() with the
unconditional weights c(level1, level2).
Usage
maihda_fit_wemix(formula, data, family, sampling_weights, dot_vals)
Arguments
formula |
The resolved model formula (with |
data |
The data (after strata creation / response recoding). |
family |
The resolved family object (gaussian-identity or binomial-logit). |
sampling_weights |
Name of the level-1 sampling-weight column. |
dot_vals |
Named list of evaluated |
Value
A list with model (the WeMixResults) and data
(the analytic data frame actually fitted, including the weight columns).
Glance at a MAIHDA model or analysis
Description
glance methods that return the MAIHDA headline as a
one-row tibble: the variance partition coefficient (VPC/ICC), and – for a
maihda_analysis – the proportional change in variance (PCV), plus the
additive/interaction shares and the discriminatory accuracy (AUC, MOR) for a
binomial outcome. The layout is uniform across the lme4, brms, WeMix and ordinal
engines. No other package emits this row from the underlying fit, because PCV
needs the null+adjusted pair that only a maihda_analysis holds.
Usage
## S3 method for class 'maihda_summary'
glance(x, ...)
## S3 method for class 'maihda_model'
glance(x, ...)
## S3 method for class 'maihda_analysis'
glance(x, ...)
Arguments
x |
A |
... |
Unused, for S3 consistency. |
Value
A one-row tibble. glance.maihda_analysis adds pcv
(with pcv.conf.low/pcv.conf.high when bootstrapped or from a brms
posterior), the adjusted-model auc.adjusted, nobs, family
and mode to the columns produced for a single summary.
See Also
maihda_tidiers for the per-estimate tidy() methods.
Examples
data("maihda_health_data")
a <- maihda(BMI ~ Age + Gender + Race + Education + (1 | Gender:Race:Education),
data = maihda_health_data)
glance(a)
NHANES Health Data Subset for MAIHDA Use
Description
A pedagogical subset of the National Health and Nutrition Examination Survey (NHANES) dataset, serving as a real-world example for Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (MAIHDA). Contains selected records demonstrating intersectional demographic health inequalities.
Usage
maihda_health_data
Format
A data frame with 3,000 rows and 7 variables:
- BMI
Body Mass Index (kg/m^2), a continuous outcome variable.
- Obese
Factor indicating obesity status (No/Yes).
- Age
Age in years at screening, a continuous covariate.
- Gender
Gender of the participant (male/female).
- Race
Self-reported race/ethnicity.
- Education
Educational attainment level.
- Poverty
Poverty to income ratio, a continuous covariate. Some values may be missing.
Note
This is a teaching/illustration dataset only. It is a non-random subsample and
does not carry the NHANES survey weights or complex sampling design,
so results are not survey-representative and should not be used for
substantive population inference. (For your own survey data, the package
supports design-weighted MAIHDA via the sampling_weights argument of
fit_maihda / maihda.)
Source
Derived from the NHANES R package. Original data collected by the
Centers for Disease Control and Prevention (CDC).
Examples
data(maihda_health_data)
# Example usage:
# strata_result <- make_strata(maihda_health_data, vars = c("Gender", "Race", "Education"))
# model <- fit_maihda(BMI ~ Age + (1 | stratum), data = strata_result$data)
Information criteria for MAIHDA models
Description
Reports the relative-fit information criteria for one or more MAIHDA models, to
help choose between model structures (different covariate sets, strata
definitions, or families) – a question the VPC/ICC and PCV do not address. The
criteria reported depend on the engine: AIC and BIC for the
likelihood engines (lme4, and ordinal::clmm), and the Bayesian
WAIC and LOOIC (leave-one-out information criterion) for
brms. Lower is better for all four.
Usage
maihda_ic(..., model_names = NULL)
Arguments
... |
One or more |
model_names |
Optional character vector of names, one per |
Details
REML vs ML. lmer fits Gaussian models by REML by default, and a
REML log-likelihood (hence its AIC/BIC) is not comparable across models
with different fixed effects – exactly the canonical MAIHDA null-vs-adjusted
comparison. When more than one model is supplied, maihda_ic() therefore
refits any REML lmer model with maximum likelihood
(refitML) before computing AIC/BIC, matching the behaviour of
anova() on lme4 models; the estimator column records when
this happened. For a single model the criterion is reported as fitted (the
estimator column then reads "REML").
Comparability. Like the VPC, information criteria are only comparable
across models fitted to the same analytic sample (same rows and outcome).
AIC/BIC additionally require the same response distribution – they are not
comparable across families (e.g. a Gaussian vs a Poisson fit), nor between the
likelihood engines and brms (AIC/BIC vs WAIC/LOOIC are different scales).
maihda_ic() does not enforce this; compare_maihda warns when
the supplied models differ in outcome, sample, or family.
Design-weighted fits. For the wemix (design-weighted) engine the
criteria are reported as NA: a pseudo-likelihood with sampling weights does
not define a standard AIC/BIC.
Value
A data.frame of class maihda_ic with one row per model and
the columns that apply: model, n (analytic sample size),
estimator, df (number of parameters; likelihood engines),
logLik, AIC, BIC (likelihood engines), WAIC,
LOOIC (brms), and – when more than one model is supplied – delta
(the difference from the best model on the primary criterion: AIC for the
likelihood engines, LOOIC for brms). Columns that are entirely NA across
the supplied models are dropped.
See Also
compare_maihda, which reports these criteria alongside the
VPC/ICC, and calculate_pvc for the variance decomposition.
Examples
strata <- make_strata(maihda_sim_data, vars = c("gender", "race"))
null_model <- fit_maihda(health_outcome ~ 1 + (1 | stratum), data = strata$data)
adj_model <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata$data)
# AIC/BIC for two nested structures (REML lmer fits are ML-refitted first)
maihda_ic(null_model, adj_model, model_names = c("Null", "Adjusted"))
# Or straight from a one-call maihda() analysis (null + adjusted rows)
a <- maihda(health_outcome ~ age + gender + race + (1 | gender:race),
data = maihda_sim_data)
maihda_ic(a)
Information criteria for a single MAIHDA model
Description
Internal worker for maihda_ic: returns a one-row data frame of the
fit criteria for one maihda_model, dispatched on the fitted object's class
(mirroring maihda_fit_diagnostics).
Usage
maihda_ic_one(model, ml = FALSE)
Arguments
model |
A |
ml |
Logical; for a REML |
Value
A one-row data frame with n, estimator, df,
logLik, AIC, BIC, WAIC, LOOIC (NA where not
applicable to the engine).
Flag strata with credibly non-zero intersectional interaction
Description
Reports, for each intersectional stratum, the interaction component of its outcome – the stratum random effect (BLUP) of an adjusted MAIHDA model, i.e. how far the stratum departs from the additive main-effects prediction of its defining dimensions – and flags the strata whose interaction is credibly different from zero. This is the heart of "where is there genuine intersectionality": a flagged stratum is one whose joint identity produces an outcome the additive parts do not.
Usage
maihda_interactions(object, conf_level = 0.95, adjust = "none", ...)
Arguments
object |
A |
conf_level |
Confidence / credible level for the interval and the flag. Default 0.95. |
adjust |
Multiple-comparison adjustment for the per-stratum p-values
(frequentist engines only): |
... |
Currently unused. |
Details
It must be read off the adjusted model. Only when the dimensions'
additive main effects are in the model (the adjusted model of the
two-model decomposition, or the crossed-dimensions model) does the stratum
random effect isolate the pure interaction. On a null model the stratum
random effect is the total between-stratum deviation (additive + interaction),
so passing one is flagged with a warning. Passing a maihda result
uses the right model automatically.
Frequentist vs. Bayesian evidence. For the frequentist engines
(lme4, wemix, ordinal) the flag comes from the BLUP's
conditional standard error: a Wald interval at conf_level and a two-sided
p-value, with an optional multiplicity correction (adjust). For
brms the full posterior is already available, so the exact
posterior tail is used – a credible interval at conf_level and the
probability of direction pd = P(BLUP > 0) – and adjust is not
applied (the Bayesian answer is multiplicity-free).
Multiplicity: partial pooling and a correction are different things, and the experts disagree.
-
Shrinkage (magnitude/sign). The stratum BLUP is partially pooled, so extreme values are regularised toward the grand mean, attenuating exaggerated-magnitude and wrong-sign (Type M/S) error (Gelman & Carlin 2014). Gelman, Hill & Yajima (2012) argue this shrinkage usually substitutes for a classical multiple-comparisons correction (the problem can "disappear entirely" in the hierarchical model); on that view the flag/no-flag step itself is what to avoid – the null of an exactly zero interaction is rarely the question (McShane, Gelman et al. 2019) – so report the estimate and its interval.
-
Whether to correct. If you do want an error-rate screen, whether a correction is warranted depends on the inferential structure of the claim – the joint hypothesis, not the number of strata (Rubin 2021). Each stratum as its own pre-specified hypothesis ("does this stratum interact?") is individual testing and needs none – only if you do not also read the flags collectively. Once the question is "is there an interaction somewhere?" – which an automated all-strata scan effectively is – it is disjunction testing and a correction applies.
adjust = "none" is the default because the table is formally a set of
individual hypotheses; for the common exploratory scan of all strata,
prefer adjust = "BH". Choosing FDR over family-wise
("bonferroni"/"holm") matches a screening goal (the expected
proportion of false discoveries) – this is the package's choice, not a
recommendation of Rubin (2021), who raises FDR only to distinguish it from the
family-wise rate. The flag itself is a Wald test on a shrunken BLUP whose
conditional SE treats the variance components as known, so it (and any
adjust on it) is an explicit, approximate screen, not a procedure
inheriting an exact guarantee from the model. Lead with the interval (and, for
brms, the probability of direction); the substantive question is often not
whether an interaction differs from zero but whether it exceeds a smallest
interaction of interest (an equivalence/SESOI reading; Lakens, Scheel & Isager
2018), read from the interval.
The interaction is reported on the model's link (latent) scale – a log-odds deviation for a logistic model, etc. – because the additive/interaction split is only exact there.
Value
An object of class maihda_interactions (a data frame), one row
per stratum, sorted flagged-first then by abs(interaction). Columns
common to every engine: stratum, label, n (stratum size),
interaction (the BLUP), lower/upper (the interval),
flagged (logical), and direction ("above"/"below"
the additive expectation). Frequentist fits add se and p_value
(and p_adjusted when adjust != "none"); brms adds
pd (probability of direction). Attributes record conf_level,
adjust, engine, model_type, n_strata,
n_flagged, scale and singular.
References
Evans, C. R., Williams, D. R., Onnela, J. P., & Subramanian, S. V. (2018). A multilevel approach to modeling health inequalities at the intersection of multiple social identities. Social Science & Medicine, 203, 64-73.
Merlo, J. (2018). Multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA) within an intersectional framework. Social Science & Medicine, 203, 74-80.
Gelman, A., Hill, J., & Yajima, M. (2012). Why we (usually) don't have to worry about multiple comparisons. Journal of Research on Educational Effectiveness, 5(2), 189-211.
Gelman, A., & Carlin, J. (2014). Beyond power calculations: assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science, 9(6), 641-651.
Rubin, M. (2021). When to adjust alpha during multiple testing: a consideration of disjunction, conjunction, and individual testing. Synthese, 199(3-4), 10969-11000. doi:10.1007/s11229-021-03276-4
McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 73(sup1), 235-245.
Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: a tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259-269.
See Also
maihda, calculate_pvc,
summary.maihda_model; and plot(...,
highlight_interactions = TRUE) to mark the flagged strata on the
effect-decomposition / predicted / shrinkage plots.
Examples
data(maihda_health_data)
a <- maihda(BMI ~ Age + Gender + Race + (1 | Gender:Race),
data = maihda_health_data)
maihda_interactions(a) # which strata interact (95%, no correction)
maihda_interactions(a, adjust = "BH") # FDR-controlled screen
Simulated Longitudinal Data for MAIHDA
Description
A simulated long-format (one row per person-occasion) panel for demonstrating
longitudinal / growth-curve MAIHDA (fit_maihda(id =, time =) and
maihda(decomposition = "longitudinal")). 600 individuals are each
measured over five waves on a continuous wellbeing score, within 12
intersectional strata defined by gender x ethnicity x education. The
between-stratum trajectory differences are constructed to be mostly
additive (each dimension's main effect on the baseline level and on the rate of
change) with one genuine multiplicative interaction, so the longitudinal PCV –
a high but sub-1 PCV_slope – is demonstrable.
Usage
maihda_long_data
Format
A data frame with 3000 rows (600 persons x 5 waves) and 8 variables:
- id
Person identifier (level 2); repeated across waves.
- wave
Measurement occasion, 0 to 4 (the numeric time variable).
- gender
Gender (
Women/Men); a stratum dimension.- ethnicity
Ethnicity (
EthA/EthB/EthC); a stratum dimension.- education
Education (
Low/High); a stratum dimension.- age
Baseline age in years, a time-invariant covariate.
- wellbeing
Continuous wellbeing outcome (the growth-curve response).
- low_wellbeing
Binary companion outcome (1 = bottom 40% of wellbeing), for the logistic longitudinal path.
Source
Simulated for the purpose of the MAIHDA package. The growth structure follows the longitudinal MAIHDA of Bell, Evans, Holman & Leckie (2024) doi:10.1016/j.socscimed.2024.116955.
Examples
data(maihda_long_data)
# Time-varying VPC from a 3-level growth model:
m <- fit_maihda(wellbeing ~ wave + (1 | gender:ethnicity:education),
data = maihda_long_data, id = "id", time = "wave")
summary(m)
# Additive-vs-multiplicative PCV (null vs adjusted growth model):
a <- maihda(wellbeing ~ wave + (1 | gender:ethnicity:education),
data = maihda_long_data, id = "id", time = "wave",
decomposition = "longitudinal")
a$pcv
Build the adjusted-model formula for a longitudinal MAIHDA decomposition
Description
The longitudinal analogue of maihda_adjusted_formula: the null
growth model plus the stratum dimensions' additive main effects AND their
interactions with the time polynomial (dim:time), so the remaining
stratum-level intercept/slope variance is the intersectional interaction beyond
additive. Auto-binned numeric dimensions reuse their reconstructed tertile
factor (.maihda_dim_*, via maihda_adjusted_terms).
Usage
maihda_longitudinal_adjusted_formula(
null_formula,
strata_vars,
autobin_info,
data,
time,
time_degree
)
Arguments
null_formula |
The fitted null growth formula. |
strata_vars, autobin_info, data |
Stratum metadata (as for
|
time, time_degree |
The longitudinal specification. |
Value
A list with formula and data, or NULL if fewer
than two dimensions are available.
Build the 3-level growth formula for a longitudinal MAIHDA
Description
Given a base formula already carrying the covariates and the resolved stratum
grouping (y ~ covars + (1 | stratum)), returns the growth formula
y ~ covars + time(+ I(time^2)...) + (time... | id) + (time... | stratum):
the time polynomial enters the fixed part (if absent) and a random
intercept+slope block is placed at both the individual and stratum levels. Any
random effects in the base formula are replaced by this canonical structure.
Usage
maihda_longitudinal_formula(base_formula, id, time, time_degree)
Arguments
base_formula |
The resolved formula (fixed part + stratum grouping). |
id, time, time_degree |
The longitudinal specification. |
Value
The growth formula (same environment as base_formula).
Longitudinal MAIHDA proportional change in variance (PCV)
Description
Compares the stratum-level random-effect covariance block of the null growth
model with that of the adjusted model (null + dimension main effects + their
dim:time interactions). Reports the PCV in the baseline (intercept)
variance and in the slope variance – the additive-vs-multiplicative split of
the intersectional trajectory inequality (Bell, Evans, Holman & Leckie 2024) –
and the time-specific PCV over the supplied times.
Usage
maihda_longitudinal_pcv(null_model, adjusted_model, times = NULL)
Arguments
null_model, adjusted_model |
Longitudinal |
times |
Optional numeric times for the time-specific PCV; defaults to the null model's reporting grid. |
Value
An object of class maihda_long_pcv.
Per-stratum trajectory parameters for a longitudinal MAIHDA
Description
The stratum-level random-effect estimates as a wide table, one row per stratum:
the stratum's deviation at the baseline time (baseline, the longitudinal
analogue of a cross-sectional stratum BLUP), the raw random intercept at
time 0 (intercept) and the random slope(s) on time (slope, ...).
This is the longitudinal shape of predict_maihda(type = "strata") – a
stratum is now a trajectory, not a single value.
Usage
maihda_longitudinal_strata_predictions(object)
Arguments
object |
A longitudinal |
Details
baseline is a(t_0)' coef with a(t) = (1, t, t^2, ...) and
t_0 = the reference (baseline) time ref_time = min(time); the
package defines the baseline at ref_time, so it equals the raw
intercept (deviation at time 0) only when time is zero-referenced.
Value
A data frame: stratum, stratum_id, optional label,
baseline, intercept, slope(, slope2, ...).
Time-varying VPC summary for a longitudinal MAIHDA (brms, linear growth)
Description
Time-varying VPC summary for a longitudinal MAIHDA (brms, linear growth)
Usage
maihda_longitudinal_summary_brms(object, conf_level = 0.95)
Arguments
object |
A longitudinal |
conf_level |
Credible-interval level. |
Value
As maihda_longitudinal_summary_lme4, with posterior bands.
Time-varying VPC summary for a longitudinal MAIHDA (lme4)
Description
Time-varying VPC summary for a longitudinal MAIHDA (lme4)
Usage
maihda_longitudinal_summary_lme4(
object,
bootstrap = FALSE,
n_boot = 1000,
conf_level = 0.95
)
Arguments
object |
A longitudinal |
bootstrap, n_boot, conf_level |
Parametric-bootstrap controls for the VPC(t) band. |
Value
A list with vpc_result (the reference-time VPC, for the headline
print), variance_components, and longitudinal (the trajectory).
Median Odds Ratio (MOR) for a logistic MAIHDA model
Description
The Median Odds Ratio translates the between-stratum variance of a logistic
MAIHDA model onto the odds-ratio scale: the median relative change in the odds
of the outcome when comparing two individuals from randomly chosen strata
(higher- vs lower-risk). MOR = exp(sqrt(2 * V_A) * qnorm(0.75)), where
V_A is the between-stratum (latent, logit-scale) variance. An MOR of 1
indicates no between-stratum heterogeneity. The MOR is defined only for the
logit link (it is the median odds ratio); a non-logit binomial
fit such as probit is rejected, because its latent variance is on a
different scale and the exp(...) above would not be an odds ratio.
For a cumulative-logit (ordinal) MAIHDA model the same formula applies to the latent logit-scale between-stratum variance and is the median cumulative odds ratio: the median relative change in the odds of being at or below any given outcome category between two randomly chosen strata (under the model's proportional-odds assumption it is the same for every category split).
Usage
maihda_mor(model)
Arguments
model |
A |
Value
A single number (the MOR, \ge 1), or NA_real_ if the
between-stratum variance is unavailable.
References
Larsen, K., & Merlo, J. (2005). Appropriate assessment of neighborhood effects on individual health: integrating random and fixed effects in multilevel logistic regression. American Journal of Epidemiology, 161(1), 81-88.
See Also
maihda_discriminatory_accuracy
Category probabilities of a cumulative model
Description
Pure function (no fit object): given latent locations eta, ordered
thresholds alpha and the link, returns the category probability matrix
via P(Y \le k) = g^{-1}(\alpha_k - \eta) and differencing. Rows are
observations, columns categories 1..K (K = length(alpha) + 1).
Usage
maihda_ordinal_category_probs(eta, thresholds, link = "logit")
Arguments
eta |
Numeric vector of latent locations. |
thresholds |
Numeric vector of increasing thresholds |
link |
|
Value
A numeric matrix with length(eta) rows that sum to 1.
Latent location to expected category score
Description
Convenience composition of maihda_ordinal_category_probs and
maihda_ordinal_expected_score.
Usage
maihda_ordinal_eta_to_score(eta, thresholds, link = "logit")
Arguments
eta |
Numeric vector of latent locations. |
thresholds |
Numeric vector of increasing thresholds. |
link |
|
Value
A numeric vector of expected category scores.
Expected category score from a probability matrix
Description
The response-scale summary of a cumulative model used throughout the package
(the plot layer's "Average Expected Category Score"): \sum_k k\, p_k,
with categories scored 1..K in order.
Usage
maihda_ordinal_expected_score(probs)
Arguments
probs |
A category-probability matrix (rows = observations). |
Value
A numeric vector of expected scores in [1, K].
Prepare data and formula for a sampling-weighted brms fit
Description
Drops rows with missing or non-positive sampling weights (with a warning),
normalizes the remaining weights to mean 1 – likelihood weights scale the
effective sample size, so unnormalized expansion weights (summing to the
population) would massively overstate the information in the data – and
rewrites the formula with a weights() addition term.
Usage
maihda_prepare_brms_sampling_weights(data, formula, sampling_weights)
Arguments
data |
The model data. |
formula |
The model formula. |
sampling_weights |
Name of the sampling-weight column. |
Value
A list with data (weights column .maihda_sw added) and
formula (rewritten).
Resolve shared strata and the fitting formula for group comparison
Description
Resolve shared strata and the fitting formula for group comparison
Usage
maihda_prepare_group_strata(formula, data, shared_strata, autobin = TRUE)
Arguments
formula |
User formula. |
data |
Full data frame. |
shared_strata |
Logical; build strata once on the full data. |
autobin |
Logical passed to make_strata. |
Value
list(data, formula, strata_vars).
Print the additive vs. intersectional decomposition of a crossed-dimensions summary
Description
Print the additive vs. intersectional decomposition of a crossed-dimensions summary
Usage
maihda_print_cc_decomposition(d)
Arguments
d |
The |
Value
No return value, called for side effects.
Print the stratum vs. context partition of a contextual cross-classified summary
Description
Print the stratum vs. context partition of a contextual cross-classified summary
Usage
maihda_print_context_partition(ctx)
Arguments
ctx |
The |
Value
No return value, called for side effects.
Simulated Health Data for MAIHDA Use
Description
A simulated dataset for demonstrating Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (MAIHDA).
Usage
maihda_sim_data
Format
A data frame with 500 rows and 7 variables:
- id
Unique participant identifier.
- gender
Gender of the participant.
- race
Simulated race/ethnicity category.
- education
Educational attainment level.
- age
Age in years, a continuous covariate.
- health_outcome
A continuous simulated health outcome.
- binary_outcome
A binary version of the health outcome.
Source
Simulated for the purpose of the MAIHDA package.
Examples
data(maihda_sim_data)
strata_result <- make_strata(maihda_sim_data, vars = c("gender", "race", "education"))
Sparse Intersectional Data for Bayesian MAIHDA
Description
A simulated cross-sectional dataset built to showcase **Bayesian (brms) MAIHDA for sparse intersections** – the regime where many intersectional strata each hold only a handful of individuals. There the maximum-likelihood (lme4) estimate of the *interaction* between-stratum variance collapses to a singular fit with no uncertainty, so the additive-vs-interaction split is both unstable and falsely precise; weakly-informative priors ('engine = "brms"') regularise the variance off the boundary and return a calibrated credible interval.
Usage
maihda_sparse_data
Format
A data frame with 240 rows and 6 variables:
- gender
Strata dimension (Women/Men).
- ethnicity
Strata dimension (White/Black/Asian).
- education
Strata dimension (Low/High).
- age_group
Strata dimension (Young/Mid/Older).
- y
A continuous (Gaussian) outcome. True between-stratum VPC 0.26, of which 40% is the intersectional interaction.
- event
A binary outcome (No/Yes), ~46% "Yes". Its latent-scale between-stratum VPC is 0.31, again 40% interaction.
The exact generative truth is also attached as
attr(maihda_sparse_data, "truth") (additive/interaction variances, shares,
and VPCs for each outcome).
Details
The data carry a **known, non-trivial interaction** so the vignette can claim *recovery* rather than merely report numbers: 4 dimensions form 36 intersectional strata with deliberately skewed sizes (median 6 individuals, 12 of 36 cells below 5, two singletons), and the true interaction accounts for **40 between-stratum variance** on both outcomes. On the binary outcome a genuine 40 interaction is read by lme4 as roughly 3 that is purely a small-cell artifact.
Note
A purely illustrative dataset. The dimension labels are arbitrary and the interaction is constructed, not estimated from any real population – its only purpose is to make the sparse-cell behaviour of the ML and Bayesian estimators visible against a known answer.
Source
Simulated; see data-raw/maihda_sparse_data.R.
Examples
data(maihda_sparse_data)
attr(maihda_sparse_data, "truth")$gaussian$interaction_share # 0.40
# ML over-shrinks the interaction under sparse cells (a singular fit):
# m_lme4 <- maihda(y ~ 1 + (1 | gender:ethnicity:education:age_group),
# data = maihda_sparse_data, decomposition = "crossed-dimensions")
#
# Weakly-informative priors regularise it and report honest uncertainty:
# m_brms <- maihda(y ~ 1 + (1 | gender:ethnicity:education:age_group),
# data = maihda_sparse_data, decomposition = "crossed-dimensions",
# engine = "brms",
# prior = brms::set_prior("normal(0, 0.5)", class = "sd"))
# See vignette("bayesian_sparse_maihda").
Per-stratum predictions for a cumulative (clmm) fit
Description
Ordinal counterpart of maihda_stratum_predictions_wemix(): per-stratum
aggregates of the location prediction plus the stratum effect, on the latent
(link) scale or as the expected category score (response scale).
Usage
maihda_stratum_predictions_ordinal(
object,
summary_obj,
scale = c("response", "link")
)
Arguments
object |
A |
summary_obj |
Its |
scale |
"response" (expected category score) or "link" (latent). |
Value
A data frame as from maihda_weighted_stratum_aggregate().
Per-stratum predictions for a wemix fit
Description
wemix counterpart of maihda_stratum_predictions_lme4(): per-stratum
means of the fixed-part prediction plus the stratum effect, aggregated with
the SAMPLING weights so the stratum-level summaries are design-weighted
(population-representative under the weights), unlike the lme4 prior-weight
aggregation.
Usage
maihda_stratum_predictions_wemix(
object,
summary_obj,
scale = c("response", "link")
)
Arguments
object |
A |
summary_obj |
Its |
scale |
"response" or "link". |
Value
A data frame as from maihda_weighted_stratum_aggregate().
Canonical MAIHDA results table and ranked-strata table
Description
Assembles the two standard MAIHDA write-up deliverables from a fitted analysis
in one call: (a) a model-results table contrasting the null and adjusted
models (intercept, between-stratum variance and SD, VPC/ICC, the PCV, and – for a
binary outcome – the AUC and Median Odds Ratio), and (b) a ranked-strata
table ordering the intersectional strata by their predicted outcome, so the
best- and worst-off strata can be read directly. It computes nothing new: every
quantity is read from the summaries already attached to the analysis, so the table
agrees exactly with summary() and plot().
Usage
maihda_table(
x,
n_strata = 10L,
scale = c("response", "link"),
which = c("null", "adjusted"),
digits = 3,
...
)
Arguments
x |
A |
n_strata |
Number of strata to show at each end (top and bottom) in the
printed ranked-strata table. The returned |
scale |
Scale for the predicted stratum values: |
which |
For a two-model analysis, which model's predictions to rank the
strata by: |
digits |
Number of decimal places for the |
... |
For a |
Details
The model-results table is mostly numeric and export-ready (e.g.
write.csv(maihda_table(a)$models, ...) or pass it to knitr::kable()):
statistics are rows, models are columns, and each estimate has accompanying
*_lower/*_upper columns that hold the confidence/credible interval
when one is available (the VPC bootstrap or posterior interval, and the bootstrap
PCV interval) and NA otherwise. The intercept and the variance/SD rows are
point estimates. The print() method renders the same table in the
familiar “estimate [low, high]” layout.
For a "crossed-dimensions" analysis (one model, no null/adjusted pair) the
results table has a single estimate column and gains “Additive share” /
“Interaction share” rows instead of the PCV. For a contextual
cross-classified analysis (maihda(context = )) it gains a
“Context share (VPC)” row. A bare fit_maihda model is also
accepted and yields a single-model table (no PCV).
The ranked-strata table ranks every stratum by its model-predicted outcome (on the
scale requested), using the same stratum predictions as
plot(type = "predicted"): the predicted value carries the conditional
(random-effect) interval, and the stratum random effect (BLUP) is reported
alongside it. By default the ranking uses the null model – the headline
intersectional inequality (which strata fare best/worst overall); set
which = "adjusted" to rank by the adjusted model instead. The full ranked
table is returned in $strata; print() shows the top and bottom
n_strata.
Value
An object of class maihda_table: a list with
models |
a data frame of the model-results table (statistics in rows; one
estimate column per model, each with |
strata |
a data frame of all strata ranked by predicted outcome, with
|
model_keys, model_labels |
the estimate-column keys and their display labels |
family, engine, mode, scale, ranked_by, n_obs, n_strata_total, context_vars |
metadata used by |
See Also
maihda, summary.maihda_model,
calculate_pvc, maihda_discriminatory_accuracy.
Examples
data(maihda_health_data)
a <- maihda(BMI ~ Age + Gender + Race + (1 | Gender:Race), data = maihda_health_data)
tab <- maihda_table(a)
tab # printed: model-results table + top/bottom strata
tab$models # the numeric, export-ready results table
tab$strata # all strata ranked by predicted BMI
# write.csv(tab$models, "results.csv", row.names = FALSE)
Generate Ternary Plot from MAIHDA Model
Description
Generate Ternary Plot from MAIHDA Model
Usage
maihda_ternary_plot(model, ...)
Arguments
model |
A fitted MAIHDA model. |
... |
Additional arguments passed to |
Value
A list containing data and plot.
Tidy a MAIHDA summary, model, or analysis
Description
tidy methods that return the MAIHDA estimates as a tidy
tibble, ready for downstream tables (gt, flextable) and
ggplot2. They read the slots that summary.maihda_model
already computes and add no new statistics.
Usage
## S3 method for class 'maihda_summary'
tidy(x, component = c("strata", "variance", "fixed"), ...)
## S3 method for class 'maihda_model'
tidy(x, component = c("strata", "variance", "fixed"), ...)
## S3 method for class 'maihda_analysis'
tidy(
x,
component = c("strata", "variance", "fixed"),
which = c("null", "adjusted"),
...
)
Arguments
x |
A |
component |
Which estimates to return:
|
... |
Unused, for S3 consistency. |
which |
For a |
Value
A tibble. For component = "strata": columns
stratum, label, estimate, std.error,
conf.low, conf.high. For "variance": component,
variance, sd, proportion. For "fixed": term,
estimate, std.error, conf.low, conf.high.
See Also
glance.maihda_analysis for the one-row model summary.
Examples
data("maihda_health_data")
m <- fit_maihda(BMI ~ Age + (1 | Gender:Race:Education), data = maihda_health_data)
tidy(m) # stratum estimates
tidy(m, component = "variance")
tidy(m, component = "fixed")
Validate a longitudinal (id / time) specification
Description
Validate a longitudinal (id / time) specification
Usage
maihda_validate_longitudinal(
id,
time,
time_degree,
data,
engine = "lme4",
sampling_weights = NULL,
context = NULL
)
Arguments
id |
Single column name: the person/unit identifier (level 2). |
time |
Single column name: a numeric measurement-time variable (level 1). |
time_degree |
Integer >= 1: polynomial degree of the growth curve (1 = linear). brms supports degree 1 only. |
data |
The model data. |
engine |
The fitting engine; only "lme4"/"brms" support the 3-level growth structure. |
sampling_weights, context |
Must be NULL – design-weighted and contextual longitudinal models are out of scope. |
Value
A list list(id, time, time_degree).
Validate a sampling-weights specification
Description
Validate a sampling-weights specification
Usage
maihda_validate_sampling_weights(sampling_weights, data)
Arguments
sampling_weights |
A single character string naming a numeric column of
|
data |
The data frame the weights must live in. |
Value
The validated column name.
Response-scale VPC for a binomial MAIHDA model
Description
Computes the variance partition coefficient on the response (probability) scale
for a binomial MAIHDA model, using the simulation method of Goldstein, Browne &
Rasbash (2002). Stratum random effects u \sim N(0, \sigma^2_u) are
simulated and converted to predicted probabilities p = g^{-1}(\eta + u)
(with \eta the fixed-part linear predictor); the VPC is then the
between-stratum variance of p as a share of the total
(between + the binomial within-stratum variance \overline{p(1-p)}).
Unlike the latent-scale VPC (fixed level-1 variance \pi^2/3 for the logit),
the response-scale VPC depends on the overall outcome prevalence, so report it as
a complement to – not a replacement for – the latent-scale value.
Usage
maihda_vpc_response(model, n_sim = 10000, seed = NULL)
Arguments
model |
A binomial |
n_sim |
Number of Monte-Carlo draws of the stratum random effect (>= 100). Default 10000. |
seed |
Optional integer seed for reproducibility. |
Details
The fixed part \eta is collapsed to a single value – the mean linear
predictor \bar\eta over the analytic sample – before the random effect is
simulated around it. The result is therefore a VPC evaluated at the mean
covariate profile (a conditional-at-mean estimate), not one marginalised over the
empirical covariate distribution. For the canonical strata-only (null) model
\eta is constant (the intercept), so the two coincide and the value is
exact. For an adjusted model (one with covariates) they can differ, because
the inverse link is nonlinear and g^{-1}(\bar\eta) \neq \overline{g^{-1}(\eta)}:
read the response-scale VPC from the null model, or interpret an adjusted value as
conditional on the average covariate profile rather than as a covariate-averaged
(marginal) VPC.
The method is binomial-link agnostic: it maps the simulated stratum effects through whichever inverse link the model uses (logit, probit, cloglog, ...), so a non-logit binomial fit is computed on its own scale rather than rejected. Only the family is required to be binomial.
Value
An object of class maihda_vpc_response: a list with
estimate, scale = "response", method = "simulation",
n_sim, var_between (the latent-scale between-stratum variance) and
lp_fixed (the mean fixed-part linear predictor).
References
Goldstein, H., Browne, W., & Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding Statistics, 1(4), 223-231.
See Also
maihda_discriminatory_accuracy, summary.maihda_model
Examples
## Not run:
strata <- make_strata(maihda_health_data, vars = c("Gender", "Race"))
d <- maihda_health_data
d$stratum <- strata$data$stratum
m <- fit_maihda(Obese ~ (1 | stratum), data = d, family = "binomial")
maihda_vpc_response(m, seed = 1)
## End(Not run)
Fixed-part (and optionally full) linear predictor of a wemix fit
Description
WeMix's own predict() method needs the grouping structure re-resolved
and offers no fixed-only form, so predictions are built directly from the
coefficient vector and the stored stratum effects: the fixed design matrix is
constructed with the training data's factor levels and multiplied by
coef, and include_re adds each row's stratum effect (conditional
mode). Everything is on the link scale.
Usage
maihda_wemix_linpred(object, newdata = NULL, include_re = TRUE)
Arguments
object |
A |
newdata |
Data to predict for; defaults to the analytic data. |
include_re |
Add the stratum random effect (conditional mode)? |
Value
A numeric vector of link-scale predictions.
Stratum random-effect table for a wemix fit
Description
Mirrors maihda_stratum_ranef_lme4(): one row per stratum with the
random-effect estimate (conditional mode), a conditional standard error, and a
95% interval. WeMix reports no conditional variances, so the SE is computed
analytically from the weighted pseudo-likelihood: for a Gaussian model the
conditional precision of u_j is 1/\tau^2 + \sum_j w_{ij}/\sigma^2
(the design-weighted analogue of lme4's condVar, to which it reduces at
unit weights), and for a binomial-logit model the Laplace curvature at the
conditional mode, 1/\tau^2 + \sum_j w_{ij}\,\hat p_{ij}(1-\hat p_{ij}).
These are model-based approximations, not design-based (replicate-weight)
uncertainty.
Usage
maihda_wemix_stratum_ranef(object)
Arguments
object |
A |
Value
A data frame with stratum, stratum_id,
random_effect, se, lower_95, upper_95.
Variance components of a wemix MAIHDA fit
Description
Reads the between-stratum variance (and, for a linear model, the residual
variance) from the WeMixResults variance table. For a binomial-logit
model the level-1 variance is the usual latent-scale \pi^2/3, matching
the lme4/brms summaries.
Usage
maihda_wemix_variances(object)
Arguments
object |
A |
Value
A list with stratum and residual variances.
Create Strata from Multiple Variables
Description
This function creates strata (intersectional categories) from multiple categorical variables in a dataset.
Usage
make_strata(data, vars, sep = " × ", min_n = 1, autobin = TRUE)
Arguments
data |
A data frame containing the variables to create strata from. |
vars |
Character vector of variable names to use for creating strata. |
sep |
Separator to use between variable values when creating stratum labels. Default is " \u00d7 " (a mathematical multiplication sign). |
min_n |
Minimum number of observations required for a stratum to be included. Strata with fewer observations will be coded as NA. Default is 1. |
autobin |
Logical indicating whether to automatically bin numeric grouping
variables with more than 10 unique values into 3 categories (tertiles).
Default is TRUE. When this happens a |
Details
If any of the specified variables has a missing value (NA) for a given observation, that observation will be assigned to the NA stratum (stratum = NA), rather than creating a stratum that includes the missing value.
The strata_info data frame is also attached as an attribute to the data, which allows fit_maihda() to automatically capture stratum labels for use in plots and summaries.
Value
A list with two elements:
data |
The original data frame with an added 'stratum' column. The strata_info is also attached as an attribute for use by fit_maihda() |
strata_info |
A data frame with information about each stratum including counts and the combination of variable values |
Examples
# Create strata from gender and race variables
result <- make_strata(maihda_sim_data, vars = c("gender", "race"))
print(result$strata_info)
Plot a MAIHDA Analysis
Description
Dispatches each type to the model it is valid on. The VPC and shrinkage
views ("vpc", "obs_vs_shrunken", "predicted") use the
null model. The additive-vs-intersectional views ("risk_vs_effect",
"effect_decomp", "ternary", "prediction_deviation") use the
adjusted model, whose fixed effects carry the dimensions' additive part so
the stratum random effect is the pure interaction; with fewer than two dimensions
(no adjusted model) they fall back to the null model. Group types
("group_vpc", "group_components", "group_between_variance",
"group_pcv") use the group comparison when maihda was called
with a group.
Usage
## S3 method for class 'maihda_analysis'
plot(x, type = "all", highlight_interactions = FALSE, ...)
Arguments
x |
A |
type |
One of the model types ("all", "vpc", "obs_vs_shrunken", "predicted",
"risk_vs_effect", "effect_decomp", "ternary", "prediction_deviation"), the
contextual type ("context_vpc", a stratum-vs-context variance bar; requires
|
highlight_interactions |
Highlight strata with a credibly non-zero
intersectional interaction on the BLUP-based views (see
|
... |
Additional arguments passed to the underlying plot method. |
Value
A ggplot2 object, or (for type = "all") an invisible list of them.
Plot a MAIHDA Model Comparison
Description
Plots VPC/ICC across the models compared by compare_maihda.
Dispatched via plot() on the classed result.
Usage
## S3 method for class 'maihda_comparison'
plot(x, ...)
Arguments
x |
A |
... |
Additional arguments (not used). |
Value
A ggplot2 object.
Examples
strata <- make_strata(maihda_sim_data, vars = c("gender", "race"))
null_model <- fit_maihda(health_outcome ~ 1 + (1 | stratum), data = strata$data)
adj_model <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata$data)
comparison <- compare_maihda(null_model, adj_model, bootstrap = TRUE)
plot(comparison)
Plot a MAIHDA Group Comparison
Description
Visualises the output of compare_maihda_groups as a point/forest
plot of the VPC/ICC by group, as stacked variance-composition bars (between- vs
within-stratum share) by group, as bars of the absolute between-stratum
(intersectional) variance by group, or as bars of the additive share (PCV) by
group. Dispatched via plot() on the classed result.
Usage
## S3 method for class 'maihda_group_comparison'
plot(
x,
type = c("vpc", "components", "between_variance", "pcv", "additive_share"),
...
)
Arguments
x |
A |
type |
One of "vpc" (default) for VPC by group with optional bootstrap confidence intervals, "components" for stacked variance proportions (additive / interaction / residual for a crossed-dimensions comparison, between / other / residual otherwise), "between_variance" for the absolute between-stratum variance by group, "pcv" for the two-model additive share (null -> adjusted proportional change in between-stratum variance) by group, or "additive_share" for the crossed-dimensions additive share by group. The VPC is a share of the unexplained variance; "between_variance" shows the magnitude the ratio cannot convey (two groups with very different VPCs can share a between-stratum variance, and vice versa); "pcv" requires strata defined by at least two dimensions. |
... |
Additional arguments (not used). |
Value
A ggplot2 object.
Examples
data(maihda_health_data)
cmp <- compare_maihda_groups(BMI ~ Age + (1 | Gender:Race),
data = maihda_health_data, group = "Education")
plot(cmp, type = "vpc")
plot(cmp, type = "components")
plot(cmp, type = "between_variance")
Plot MAIHDA Model Results
Description
Creates various plots for visualizing MAIHDA model results including variance partition coefficient comparisons, observed vs. shrunken estimates, and predicted subgroup values with confidence intervals.
Usage
## S3 method for class 'maihda_model'
plot(
x,
type = c("all", "vpc", "obs_vs_shrunken", "predicted", "risk_vs_effect",
"effect_decomp", "ternary", "prediction_deviation", "context_vpc", "vpc_trajectory",
"trajectories"),
summary_obj = NULL,
n_strata = 50,
highlight_interactions = FALSE,
...
)
Arguments
x |
A maihda_model object from |
type |
Character string specifying plot type:
|
summary_obj |
Optional maihda_summary object from |
n_strata |
Maximum number of strata to display in the predicted plot.
When there are more strata than this, the first |
highlight_interactions |
Highlight the strata that carry a credibly
non-zero intersectional interaction (from |
... |
Additional arguments (not currently used). |
Value
For a single type, a ggplot2 object that you can extend
with the usual + grammar (themes, labs(),
added layers, or a replacement fill/colour scale). Two types return a richer
object: "prediction_deviation" returns a patchwork of two
panels (theme every panel at once with & theme_*()), and
"ternary" returns a ggtern object (use the
ggtern::theme_*() family rather than the standard ggplot2 themes).
type = "all" returns a named list of ggplot2 objects.
Examples
strata_result <- make_strata(maihda_sim_data, vars = c("gender", "race"))
model <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata_result$data)
# VPC plot
plot(model, type = "vpc")
# Single-type plots are ggplot objects -- restyle them with ggplot2:
plot(model, type = "vpc") +
ggplot2::theme_classic() +
ggplot2::labs(title = "Variance partition, restyled")
# Generate all plots (a named list); pick one out to restyle it:
plots <- plot(model)
plots$predicted + ggplot2::theme_bw()
Plot MAIHDA Ternary Diagram
Description
Renders the ternary decomposition produced by
compute_maihda_ternary_data. Dispatched via plot() on the
classed result.
Usage
## S3 method for class 'maihda_ternary'
plot(
x,
size_var = "n",
color_var = "label",
label_top_n = 5,
label_by = c("interaction_signal", "uncertainty", "n"),
alpha = 0.7,
...
)
Arguments
x |
A |
size_var |
Column name for point sizing. |
color_var |
Column name for point colors. |
label_top_n |
Number of top strata to label. |
label_by |
Variable used to determine top strata. |
alpha |
Point transparency. |
... |
Additional arguments (not used). |
Value
A plot object.
Note
This method attaches the ggtern package to the search
path (as if by library(ggtern)) if it is not already attached. This
is a deliberate, unavoidable side effect: ggtern replaces several
ggplot2 build/print internals at attach time, and without it the
ternary coordinate system and themes do not render (you get a blank or
distorted plot). The attachment persists after the call so the returned
object can still be printed later in the session; it is not detached on
exit. If you need a pristine search path, attach ggtern yourself
before plotting and manage its lifecycle, or run plotting in a separate
session.
Plot Model Comparison (deprecated)
Description
Deprecated. Use plot() on the compare_maihda result
instead, e.g. plot(compare_maihda(...)).
Usage
plot_comparison(comparison_df)
Arguments
comparison_df |
A data frame from |
Value
A ggplot2 object.
Stratum vs. Context Variance Plot (contextual cross-classified MAIHDA)
Description
One bar per variance component – the between-stratum (intersectional)
variance, each context's variance, any other random effects, and the residual
– on the variance scale, with each component's share of the total printed
above its bar. Complements plot_vpc()'s stacked proportion bar by
showing the magnitudes the shares are computed from.
Usage
plot_context_vpc(summary_obj)
Arguments
summary_obj |
A |
Value
A ggplot2 object.
Effect Decomposition Plot
Description
Decomposes the total deviation from the overall mean into the additive (fixed) component and the intersectional (random) component for each stratum.
Usage
plot_effect_decomposition(
object,
summary_obj,
top_n_labels = 10,
highlight = NULL
)
Arguments
object |
A maihda_model object |
summary_obj |
A maihda_summary object |
top_n_labels |
Number of most extreme strata to label |
highlight |
Optional character vector of stratum ids to highlight. When supplied, labels are restricted to these strata rather than the most extreme overall deviations. |
Value
A ggplot2 object
Plot a MAIHDA Group Comparison (deprecated)
Description
Deprecated. Use plot() on the compare_maihda_groups
result instead, e.g. plot(cmp, type = "vpc").
Usage
plot_group_comparison(
x,
type = c("vpc", "components", "between_variance", "pcv", "additive_share")
)
Arguments
x |
A |
type |
One of "vpc" (default), "components", "between_variance", or "pcv". |
Value
A ggplot2 object.
Plot MAIHDA Ternary Diagram (deprecated)
Description
Deprecated. Use plot() on the compute_maihda_ternary_data
result instead, e.g. plot(compute_maihda_ternary_data(model)).
Usage
plot_maihda_ternary(ternary_data, ...)
Arguments
ternary_data |
Data output from |
... |
Further arguments passed to |
Value
A plot object.
Observed vs. Shrunken Estimates Plot
Description
Observed vs. Shrunken Estimates Plot
Usage
plot_obs_vs_shrunken(object, summary_obj, highlight = NULL)
Arguments
object |
A maihda_model object |
summary_obj |
A maihda_summary object |
Details
The x-axis is each stratum's raw observed mean; the y-axis is the model-based stratum estimate, which includes the fixed-effect contribution. For an intercept-only (null) model the vertical distance from the diagonal is pure shrinkage toward the grand mean. For a covariate-adjusted model the model estimate also moves with the stratum's covariate profile, so distance from the diagonal reflects both shrinkage and covariate adjustment and should not be read as shrinkage alone. The caption notes which case applies.
Value
A ggplot2 object
Time-specific PCV plot (longitudinal MAIHDA)
Description
The additive share – the proportional change in the between-stratum (trajectory) variance from the null to the adjusted model – as a function of time. A high, flat curve means intersectional trajectory inequalities are "mostly additive".
Usage
plot_pcv_trajectory(pcv)
Arguments
pcv |
A |
Value
A ggplot2 object.
Plot Predicted Stratum Values with Confidence Intervals
Description
Plot Predicted Stratum Values with Confidence Intervals
Usage
plot_predicted_strata(
object,
summary_obj,
n_strata,
scale = c("response", "link"),
highlight = NULL
)
Arguments
object |
A maihda_model object |
summary_obj |
A maihda_summary object |
n_strata |
Maximum number of strata to display (the first n_strata, in stratum order) |
scale |
Prediction scale: "response" (default) or "link" |
Value
A ggplot2 object
Plot Prediction Deviation Panels
Description
Creates an advanced, publication-ready two-panel dashboard for visualizing predicted values and highlighting the most notable cases or strata. What "notable" means depends on the model type, and the labelled points are not statistical outliers in the regression-diagnostic sense:
Gaussian and Poisson (and the ordinal
"expected_score"mode): the cases/strata whose prediction sits furthest from the mean prediction (largest deviation), ranked by absolute deviation.Binomial: the cases/strata with the largest absolute deviance residual, i.e. where the observed 0/1 outcome is least consistent with the fitted probability (worst-fit points), ranked by
|deviance residual|.Ordinal
"surprise"mode: the cases/strata with the highest surprise-\log P(\text{observed category}), i.e. the least probable observations under the model.
Usage
plot_prediction_deviation_panels(
model,
data = NULL,
type = c("auto", "gaussian", "poisson", "binomial", "ordinal"),
ordinal_mode = c("surprise", "expected_score"),
top_n_labels = 5,
strata_info = NULL
)
Arguments
model |
A fitted model object (e.g., from 'lm()', 'glm()', 'MASS::polr()', or 'lme4::glmer()'). |
data |
The original data frame used to fit the model. If 'NULL', attempts to extract from the model. |
type |
Model type: "auto" (default), "gaussian", "poisson", "binomial", or "ordinal". |
ordinal_mode |
For ordinal models: "surprise" (default, based on observation probability) or "expected_score". |
top_n_labels |
Number of points to label on the plot. The ranking metric depends on the model type (see Description): deviation from the mean prediction for Gaussian/Poisson and the ordinal expected-score mode, absolute deviance residual for binomial, and surprise for the ordinal surprise mode. Default is 5. |
strata_info |
Optional data frame of strata labels, generally extracted from 'maihda_model' objects. |
Value
A 'patchwork' object containing two 'ggplot2' panels.
Mean Prediction vs. Stratum Random Effect Plot
Description
Creates a quadrant scatterplot comparing each stratum's mean predicted outcome against its stratum random effect (shrunken between-stratum deviation). Points represent strata. Whether a higher predicted value is "worse" or "better" depends on the outcome, so the axes are not framed as risk. The random effect equals the pure intersectional (interaction) component only when the additive main effects of the strata variables are included in the model; otherwise it also absorbs those omitted main effects.
Usage
plot_risk_vs_effect(object, summary_obj, top_n_labels = 10)
Arguments
object |
A maihda_model object |
summary_obj |
A maihda_summary object |
top_n_labels |
Number of most extreme strata to label (by absolute effect size) |
Value
A ggplot2 object
Stratum mean-trajectory plot (longitudinal MAIHDA)
Description
One predicted line per stratum over time – the fixed-part trajectory plus each stratum's random intercept and slope (BLUPs) – the longitudinal analogue of the predicted-strata caterpillar. Shows how the intersectional groups fan out (or converge) over time.
Usage
plot_stratum_trajectories(object, summary_obj, n_strata = 50)
Arguments
object |
A longitudinal |
summary_obj |
Its |
n_strata |
Maximum number of strata to draw (by stratum order); the rest are noted in the caption. |
Value
A ggplot2 object.
VPC Visualization Plot
Description
VPC Visualization Plot
Usage
plot_vpc(summary_obj)
Arguments
summary_obj |
A maihda_summary object |
Value
A ggplot2 object
Time-varying VPC trajectory plot (longitudinal MAIHDA)
Description
The between-stratum share of variance (VPC/ICC) as a function of time, with a confidence/credible ribbon when available. The headline reference-time VPC is marked. For a longitudinal MAIHDA the VPC is not a single number – the between-stratum variance is a random intercept + slope on time – so this curve replaces the cross-sectional VPC bar.
Usage
plot_vpc_trajectory(summary_obj)
Arguments
summary_obj |
A |
Value
A ggplot2 object.
Predict from MAIHDA Model
Description
Makes predictions from a fitted MAIHDA model, either at the stratum level or individual level.
Usage
predict_maihda(
object,
newdata = NULL,
type = c("individual", "strata", "response", "link"),
scale = c("response", "link"),
...
)
Arguments
object |
A maihda_model object from |
newdata |
Optional data frame for making predictions. If NULL, uses the original data from model fitting. |
type |
Character string specifying prediction type:
For backward compatibility, "link" or "response" may also be passed here and will be interpreted as individual-level predictions on that scale. |
scale |
Character string specifying the prediction scale for
individual-level predictions: "response" (default) or "link". For a
cumulative (ordinal) model the "link" scale is the latent location
|
... |
Additional arguments passed to predict method of underlying model. |
Value
Depending on type:
For "individual": A numeric vector of predicted values on the requested scale
For "strata": A data frame with stratum ID and predicted random effect. When
newdatais supplied, the result is restricted to the strata present innewdata(and a stratum the model never saw is an error, as for "individual"); whennewdataisNULL, every training stratum is returned.
Examples
strata_result <- make_strata(maihda_sim_data, vars = c("gender", "race"))
model <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata_result$data)
# Individual predictions
pred_ind <- predict_maihda(model, type = "individual")
# Stratum predictions
pred_strata <- predict_maihda(model, type = "strata")
Print a MAIHDA Analysis
Description
Print a MAIHDA Analysis
Usage
## S3 method for class 'maihda_analysis'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (not used). |
Value
No return value, called for side effects.
Print method for MAIHDA group comparisons
Description
Print method for MAIHDA group comparisons
Usage
## S3 method for class 'maihda_group_comparison'
print(x, ...)
Arguments
x |
A maihda_group_comparison object. |
... |
Additional arguments (not used). |
Value
No return value, called for side effects.
Print MAIHDA information criteria
Description
Print MAIHDA information criteria
Usage
## S3 method for class 'maihda_ic'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (not used). |
Value
No return value, called for side effects.
Print a MAIHDA interaction diagnostic
Description
Print a MAIHDA interaction diagnostic
Usage
## S3 method for class 'maihda_interactions'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (not used). |
Value
No return value, called for side effects.
Print a longitudinal MAIHDA PCV
Description
Print a longitudinal MAIHDA PCV
Usage
## S3 method for class 'maihda_long_pcv'
print(x, ...)
Arguments
x |
A |
... |
Unused. |
Value
The object, invisibly.
Print method for maihda_model
Description
Print method for maihda_model
Usage
## S3 method for class 'maihda_model'
print(x, ...)
Arguments
x |
A maihda_model object |
... |
Additional arguments |
Value
No return value, called for side effects.
Print a stepwise MAIHDA table
Description
Print a stepwise MAIHDA table
Usage
## S3 method for class 'maihda_stepwise'
print(x, ...)
Arguments
x |
A |
... |
Additional arguments (not used). |
Value
Invisibly, x.
Print method for maihda_strata objects
Description
Print method for maihda_strata objects
Usage
## S3 method for class 'maihda_strata'
print(x, ...)
Arguments
x |
A maihda_strata object |
... |
Additional arguments (not used) |
Value
No return value, called for side effects.
Print method for maihda_summary objects
Description
Print method for maihda_summary objects
Usage
## S3 method for class 'maihda_summary'
print(x, ...)
Arguments
x |
A maihda_summary object |
... |
Additional arguments (not used) |
Value
No return value, called for side effects.
Print a MAIHDA results table
Description
Print a MAIHDA results table
Usage
## S3 method for class 'maihda_table'
print(x, digits = x$digits, ...)
Arguments
x |
A |
digits |
Decimal places (defaults to the value stored on |
... |
Additional arguments (not used). |
Value
Invisibly, x.
Print method for PVC results
Description
Print method for PVC results
Usage
## S3 method for class 'pvc_result'
print(x, ...)
Arguments
x |
A pvc_result object |
... |
Additional arguments |
Value
No return value, called for side effects.
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Run MAIHDA Shiny Application
Description
Launches a Shiny graphical user interface that exposes core functions of the MAIHDA package, allowing for visual data exploration, model fitting, and performance visualization.
Usage
run_maihda_app()
Value
No return value, called to launch the shiny app.
Examples
## Not run:
run_maihda_app()
## End(Not run)
Stepwise Proportional Change in Variance (PCV)
Description
Estimates the proportional change in variance (PCV) sequentially by fitting intermediate (partially-adjusted) models, adding each predictor one-by-one. The step-specific PCV is the change in between-stratum variance contributed by a predictor given the variables already in the model. Because the steps are sequential it is order-dependent: it reflects each variable's marginal, model-dependent change, not an order-invariant “unique” contribution.
Usage
stepwise_pcv(
data,
outcome,
vars,
engine = "lme4",
family = "gaussian",
sampling_weights = NULL
)
Arguments
data |
Data frame with observations. Ensure 'make_strata()' was run first so the 'stratum' variable exists. |
outcome |
Character string; the dependent variable. |
vars |
Character vector; predictors (strata groupings & covariates) to add sequentially to the model. |
engine |
Modeling engine ("lme4", "brms", or "wemix"). Default is "lme4";
switches to "wemix" automatically when |
family |
Error distribution and link function. Default is "gaussian". |
sampling_weights |
Optional name of a sampling-weight column for
design-weighted stepwise fits; see |
Details
All models are fit on the complete cases for 'outcome', 'stratum', and all variables in 'vars' so that each sequential variance comparison uses the same analytic sample.
For a binary outcome the table additionally tracks discriminatory accuracy
(Merlo et al. 2016): AUC is each model's C-statistic and Step_AUC /
Total_AUC are its absolute change (delta-AUC), in contrast to the
proportional Step_PCV / Total_PCV. The MOR is reported
for the logit link (NA otherwise) and is a monotone transform of the
between-stratum variance already in Variance. For a design-weighted fit
(sampling_weights) the AUC is the design-weighted (population) C-statistic.
Reuses maihda_discriminatory_accuracy on each step's fitted model, so
no additional models are fit. Note that adding a stratum-defining dimension
(one already encoded by the strata) typically leaves the AUC essentially unchanged:
it re-partitions the between-stratum variance (so the PCV and MOR move) but not the
per-stratum predicted ranking the rank-based AUC depends on. The AUC trajectory is
therefore most informative for individual-level covariates that vary within
strata.
Value
A data.frame (class maihda_stepwise) showing the sequential
models, the between-stratum variance at each step, and both the step-specific
and total PCV. For a binary (binomial/Bernoulli) outcome it also carries
the discriminatory-accuracy trajectory: AUC (the C-statistic of each
step's model – step 0 is the strata-only discriminatory accuracy),
Step_AUC and Total_AUC (the absolute change in AUC,
delta-AUC, versus the previous step and versus the null), and MOR (the
Median Odds Ratio, logit link only). These columns are absent for non-binary
outcomes.
References
Merlo, J., Wagner, P., Ghith, N., & Leckie, G. (2016). An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighbourhoods and health. PLOS ONE, 11(4), e0153778.
Examples
strata_result <- make_strata(maihda_sim_data, c("gender", "race"))
stepwise_pcv(strata_result$data, "health_outcome", c("gender", "race", "age"))
Summarize a MAIHDA Analysis
Description
Returns the variance summary (VPC/ICC, variance components, stratum estimates)
of the fitted model. The per-group comparison, when present, is attached as the
"groups" attribute.
Usage
## S3 method for class 'maihda_analysis'
summary(object, ...)
Arguments
object |
A |
... |
Additional arguments (not used). |
Value
The maihda_summary for the fitted model.
Summarize MAIHDA Model
Description
Provides a summary of a MAIHDA model including variance partition coefficients (VPC/ICC) and stratum-specific estimates.
Usage
## S3 method for class 'maihda_model'
summary(
object,
bootstrap = FALSE,
n_boot = 1000,
conf_level = 0.95,
response_vpc = FALSE,
seed = NULL,
...
)
Arguments
object |
A maihda_model object from |
bootstrap |
Logical indicating whether to compute parametric bootstrap
confidence intervals for VPC/ICC. Default is FALSE. Supported for lme4
models only; |
n_boot |
Number of bootstrap samples if bootstrap = TRUE. Default is 1000. |
conf_level |
Confidence level for the VPC/ICC interval – the lme4 bootstrap CI or the brms posterior credible interval. Default is 0.95. |
response_vpc |
Logical; for a binomial (lme4) model, also compute the
response-scale VPC ( |
seed |
Optional integer seed for the response-scale VPC simulation when
|
... |
Additional arguments (not currently used). |
Value
A maihda_summary object containing:
vpc |
Variance Partition Coefficient (ICC); for lme4 with
|
variance_components |
Data frame of variance components. For a
contextual cross-classified fit ( |
context |
For a contextual cross-classified fit, the stratum vs.
context partition: per-context variances and shares, the contexts' total
share ( |
discriminatory_accuracy |
For a binomial/Bernoulli outcome, the
|
vpc_response |
The response-scale VPC ( |
stratum_estimates |
Data frame of stratum-specific random effects with labels if available |
fixed_effects |
Fixed effects estimates |
thresholds |
For a cumulative (ordinal) clmm fit, the threshold (cut point) estimates with standard errors – the cumulative model's "intercepts"; NULL otherwise |
model_summary |
Original model summary |
diagnostics |
Fit-quality diagnostics (singular fit / convergence) carried over from the fitted model and reported by the print method |
Interpreting the VPC/ICC
The VPC is the between-stratum variance
divided by the total unexplained variance. For the canonical
single-stratum model that denominator is between-stratum + residual, but if the
model includes additional random effects (e.g. (1 | site)) their
variance is included in the denominator too (between-stratum + other random
effects + residual), so the VPC is the between-stratum share of all
unexplained variance. It is a conditional/residual ICC that excludes variance
captured by the fixed effects, so for models with covariates it is conditional
on them. It is most commonly read from the null model
outcome ~ 1 + (1 | stratum), where it is the total between-stratum
share. For non-Gaussian families the level-1 (residual) variance uses a
latent/distributional approximation (\pi^2/3 for logistic,
\log(1 + 1/\mu) for Poisson per Stryhn et al. 2006, and
\log(1 + 1/\mu + 1/\theta) for the negative binomial per Nakagawa,
Johnson & Schielzeth 2017), so the
VPC is on that latent scale; for a weighted Gaussian model the level-1
variance is the mean conditional residual variance,
\bar{\sigma^2 / w_i}, since the per-observation residual variance is
\sigma^2 / w_i. The stratum random effects represent the total
between-stratum deviation; they equal the pure intersectional
(interaction) component only when the additive main effects of the strata
variables are included in the model.
Note
For lme4 models a VPC/ICC interval is obtained from a parametric
bootstrap (bootstrap = TRUE). For brms models the VPC/ICC is
summarised directly from the posterior draws: the reported estimate is the
posterior median of the per-draw VPC (E[\sigma^2]-based, not the biased
E[\sigma]^2) and the interval is a central credible interval at
conf_level (default 95%), so no bootstrap argument is needed.
The variance-components table reports the posterior-mean variance components,
so the stratum proportion shown there may differ slightly from the headline
VPC because the median of a ratio is not the ratio of means. For non-Gaussian
brms families the level-1 (residual) variance uses the usual
latent-scale approximation; for poisson(log) it is evaluated at the
posterior-mean fitted values rather than per draw to avoid an expensive
ndraws \times nobs computation.
Examples
strata_result <- make_strata(maihda_sim_data, vars = c("gender", "race"))
model <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata_result$data)
summary_result <- summary(model)
# With bootstrap CI
# summary_boot <- summary(model, bootstrap = TRUE, n_boot = 50)