MAIHDA 0.1.11
New Features
maihda() now reports the intersectional
interactions by default. “Which strata genuinely interact” is
the scientific payoff of MAIHDA, so it no longer needs a separate call:
maihda() computes maihda_interactions() on the
adjusted (or crossed-dimensions) model, stores it as the
interactions slot, and surfaces a one-line headline in
print() (flagged count, the strongest stratum, and –
crucially – the multiplicity stance actually used, so an uncorrected
scan is never silently read as corrected). The computation is
essentially free (it reads the stratum estimates the summary already
holds; no refit), and it is skipped for a longitudinal decomposition
(whose interaction is a trajectory, not a scalar). Control it with the
new interactions argument: TRUE (default) uses
adjust = "none"; FALSE skips it; or pass a
p.adjust method (e.g. interactions = "BH") to
flag with that correction. fit_maihda() gains the same
argument as an opt-in
(interactions = FALSE by default, since a single fit is
often a null model where the diagnostic does not apply).
plot(..., highlight_interactions = TRUE) now reuses the
stored diagnostic, so the plot highlights and the printed headline can
no longer disagree.
- Added
broom-style tidy() and
glance() methods for maihda_model,
maihda_summary, and maihda_analysis, so MAIHDA
results drop straight into tidy data for ggplot2,
gt/flextable tables, and other downstream
tooling. tidy() returns the stratum (intersection)
random-effect estimates by default (with the human-readable
intersectional label), or the variance-components table
(component = "variance") or fixed effects
(component = "fixed"), all in broom’s
estimate/std.error/conf.low/conf.high
shape; tidy() on a maihda() analysis takes
which = c("null", "adjusted"). glance()
returns the MAIHDA headline as a one-row tibble – the
VPC/ICC (with its bootstrap/posterior interval), and for a
maihda() analysis the PCV, the
adjusted-model AUC, the additive/interaction shares, AUC and MOR for a
binary outcome, plus
n_strata/nobs/engine/family
– a row no generic mixed-model tool emits, since the PCV needs the
null+adjusted pair. The layout is uniform across the lme4,
brms, wemix, and ordinal engines.
The generics come from the lightweight
generics package (the same ones
broom/broom.mixed re-export), so
tidy()/glance() dispatch whether the user has
broom, generics, or just MAIHDA
loaded, with no hard broom dependency; the raw
fixed-effect/per-row tidying broom.mixed already does on
the underlying fit is intentionally not duplicated.
- Added longitudinal (growth-curve) MAIHDA – the
life-course extension of Bell, Evans, Holman & Leckie (2024, Soc
Sci Med 351:116955) – via new
id, time,
and time_degree arguments on fit_maihda() and
maihda(). Supplying id (the person/unit
measured repeatedly) and time (a numeric measurement-time
column) fits a 3-level growth curve – occasions within
individuals within intersectional strata – with a random intercept
and slope on time at both the individual and stratum
levels (outcome ~ time + (time | id) + (time | stratum);
the growth slopes are added automatically, so you write only the strata
shorthand (1 | var1:var2)). The between-stratum variance,
and therefore the VPC, is then a function of time
(VarS(t) = a(t)' Sigma_s a(t)): summary()
reports the baseline (reference-time) VPC, the full time-varying VPC
trajectory, and the stratum/individual intercept-slope-covariance
blocks, and a new plot type "vpc_trajectory" draws the
VPC-over-time curve (with "trajectories" for the predicted
per-stratum mean trajectories;
plot(type = "vpc")/"all" route there for a
longitudinal fit). maihda(decomposition = "longitudinal")
(selected automatically when id/time are
supplied) fits a null and an adjusted growth model – the adjusted model
adding the dimensions’ main effects and their
dim:time interactions – and reports the
PCV separately for the baseline (intercept) and the slope
variance: the additive-vs-multiplicative split of the
intersectional trajectory inequality (the paper’s “partly
multiplicative but mostly additive” finding), returned as a
maihda_long_pcv object with pcv_intercept,
pcv_slope, and a time-specific pcv_t (and a
"pcv_trajectory" plot). All families with a defined level-1
variance are supported
(gaussian/binomial/poisson/negbinomial,
latent scale for non-Gaussian) on engine = "lme4" (any
degree) and engine = "brms" (linear growth; posterior
credible bands on the VPC trajectory).
predict_maihda(type = "strata") returns each stratum’s
deviation at the baseline (reference) time plus its raw intercept and
slope (a stratum is now a trajectory; the baseline deviation,
evaluated at ref_time = min(time), is the longitudinal
analogue of a cross-sectional stratum BLUP and differs from the raw
time-0 intercept whenever time is not zero-referenced). The
intercept-only guards elsewhere are untouched – a longitudinal fit is
tagged and routed to the time-varying path, while every other model
still rejects random slopes – and
extract_between_variance()/calculate_pvc()
reject a longitudinal model with a pointer to the time-varying
decomposition. Design-weighted, contextual,
wemix/ordinal, and group-comparison
longitudinal models are out of scope (each errors). A new bundled
dataset maihda_long_data (600 persons x 5
waves, gender x ethnicity x education strata, with constructed
mostly-additive trajectory differences) demonstrates it.
- Added ordinal (cumulative) MAIHDA for
ordered-factor outcomes – the multicategorical extension the MAIHDA
tutorials flag as priority future work.
family = "ordinal"
(alias "cumulative"; probit via the new exported
maihda_cumulative("probit") or
brms::cumulative("probit")) fits a cumulative
(proportional-odds) model: a new
engine = "ordinal" wraps
ordinal::clmm() for the frequentist path (selected
automatically for an ordinal family under the default engine, with a
message) and engine = "brms" uses
brms::cumulative(). An ordered-factor outcome with 3+
levels under the all-default family/engine selects the model
automatically (with a warning; a 2-level ordered factor still takes the
binomial auto-detect path), and an unordered factor response is coerced
to ordered in its declared level order with a message. The VPC/ICC lives
on the latent scale – level-1 variance pi^2/3 (logit) or 1 (probit), the
same latent treatment as binomial models, so cumulative VPCs are
comparable to logistic ones – and summary() gains a
thresholds slot (the cut points that take the intercept’s
place, with Hessian SEs) printed alongside the location fixed effects.
Because predict.clmm does not exist, predictions are built
from the stored coefficients: the link scale is the latent location eta
= x’beta + u, and the response scale is the expected category
score sum_k k*P(Y = k) (the quantity the plots label “Average
Expected Category Score”), with P(Y <= k) = g^-1(alpha_k - eta)
differenced by pure, unit-tested helpers shared with the brms path
(whose fitted() probability array
predict_maihda() now collapses to the same score). The
two-model maihda() decomposition and PCV,
stepwise_pcv(), compare_maihda_groups()
(engine handshakes mirror the sampling_weights precedent),
the stratum plots, obs_vs_shrunken (observed mean category
score), risk_vs_effect, effect_decomp (exact
on the latent scale), and the deviation panels all work;
maihda_mor() now also accepts cumulative-logit fits,
returning the median cumulative odds ratio. Explicit
limits with targeted errors: logit/probit links only, the canonical
single (1 | stratum) structure (no
context/crossed-dimensions – use brms), no
sampling_weights on the clmm path, no parametric bootstrap
(clmm has no simulate/refit; brms provides credible intervals),
AUC/maihda_vpc_response() stay binomial-only, and the
ternary diagnostic is not yet supported. ordinal joins
Suggests.
- Added the negative-binomial family for
overdispersed count outcomes:
family = "negbinomial" on
fit_maihda() (and therefore maihda(),
stepwise_pcv(), and compare_maihda_groups()).
The dispersion parameter theta is estimated from the data –
lme4::glmer.nb() under the default engine, the
shape parameter under engine = "brms" – and a
fixed-theta MASS::negative.binomial(theta) family object is
also accepted with lme4, honouring the supplied theta. The VPC/ICC uses
the lognormal latent-scale level-1 variance
log(1 + 1/mu + 1/theta) (Nakagawa, Johnson & Schielzeth
2017, J R Soc Interface), the negative-binomial analogue of the
Stryhn et al. (2006) Poisson approximation already used by the package
(it reduces to it as theta grows); for brms the shape draws
are propagated into the VPC credible interval. The two-model
maihda() decomposition, calculate_pvc(),
parametric-bootstrap intervals (via lme4::refit(), which
holds theta fixed at its estimate – the interval is conditional on
theta, as documented), prediction, and the stratum plots (routed to the
count branch) all work; the log link is required, and the wemix engine,
maihda_discriminatory_accuracy(), and
maihda_vpc_response() reject the family with targeted
messages. Internally, engine-specific family labels are now
canonicalised so the family/link comparability checks in
calculate_pvc() and compare_maihda() no longer
depend on raw label strings: two fits whose theta is estimated
(lme4’s theta-embedding "Negative Binomial(<theta>)",
brms’s shape) compare equal, since each label otherwise
carried its own theta estimate and could never match. A fixed,
user-specified theta (MASS::negative.binomial(theta)) is
treated differently – it is part of the model specification, so it stays
in the comparability key and two fits with different fixed thetas are
(correctly) rejected as incomparable.
- Added design-weighted MAIHDA (survey/sampling
weights) via a new
sampling_weights argument on
fit_maihda(), maihda(),
stepwise_pcv(), and compare_maihda_groups().
Sampling weights are not the same thing as lme4’s
weights= (precision weights, which rescale the residual
variance), so feeding survey weights to lmer/glmer maximises the wrong
objective and gives invalid population estimates; supplying
sampling_weights with engine = "lme4" is
therefore an error, and supplying it with the default
engine switches (with a message) to the new
engine = "wemix": weighted
pseudo-maximum-likelihood via WeMix::mix() (Rabe-Hesketh
& Skrondal 2006), the estimator built for NAEP/PISA-style survey
analysis. The individual weights enter at level 1 unchanged and the
level-2 (stratum) weights are 1, because intersectional strata are
exhaustive population cells sampled with certainty. The wemix engine
supports the canonical MAIHDA structure –
gaussian(identity) or binomial(logit) with the
single (1 | stratum) intercept – and flows through the
whole toolkit: summary() reports the design-weighted
VPC/ICC (latent-scale pi^2/3 level-1 variance for logistic fits,
matching the other engines) and design-consistent (sandwich)
fixed-effect standard errors; stratum estimates carry analytic
conditional SEs (the design-weighted analogue of lme4’s
condVar, reducing to it at unit weights);
predict_maihda(), the stratum plots (aggregated with the
sampling weights, so stratum summaries are population-representative),
calculate_pvc() (which now also refuses to compare fits
with different sampling weights), the maihda()
two-model decomposition, stepwise_pcv(), and
compare_maihda_groups() all work. For a binomial fit,
maihda_discriminatory_accuracy() computes the
design-weighted AUC (each observation contributes its
weight as case/control mass) and flags it in the print method.
Alternatively engine = "brms" accepts
sampling_weights as likelihood weights
(y | weights(w), normalized to mean 1), giving a
pseudo-posterior: point estimates are design-consistent but
credible intervals are not design-based (a message says so). Limitations
are explicit rather than silent: no parametric bootstrap for wemix (a
design-based interval would need replicate weights – a possible future
extension), and no crossed random effects, so context = and
decomposition = "crossed-dimensions" require lme4/brms.
Rows with missing or non-positive weights are dropped with a warning. A
unit-weight wemix fit reproduces the lme4 ML fit to numerical precision.
WeMix joins Suggests.
- Added the contextual cross-classified MAIHDA – the
“cross-classified MAIHDA” of the literature (e.g. patients
cross-classified by intersectional stratum and hospital, or
students by stratum and school) – via a new
context argument on fit_maihda() and
maihda(). context = "school" (one or more
column names) enters each context as a crossed random intercept
alongside the intersectional stratum effect,
outcome ~ covars + (1 | stratum) + (1 | school), and the
summaries then partition the unexplained variance into
between-stratum vs. between-context vs. residual: the
headline VPC/ICC becomes the between-stratum share net of the
context, each context gets its own Context: <name>
row in the variance-components table, and a new $context
summary element carries the per-context variances and shares (with
parametric-bootstrap intervals for lme4 via
bootstrap = TRUE, and posterior credible intervals for
brms). In maihda() the context random intercept is carried
by both the null and the adjusted model, so the PCV is computed
with the context partialled out; context also composes with
decomposition = "crossed-dimensions" (the context variance
then enters the single fit and its VPC denominator). A new plot type
"context_vpc" (on both maihda_model and
maihda_analysis) bars the stratum vs. context variances
with their shares, and plot(type = "vpc") renders the
contextual split automatically. context cannot be combined
with group (a stratified per-level comparison
vs. a joint crossed model are different designs; supplying both
errors), a context variable may not be a stratum dimension or appear as
a fixed effect, and a context with few levels weakly identifies its
variance (consider engine = "brms"). A manually written
... + (1 | school) still fits and is summarised generically
as “Other random effects”; only context = activates the
labelled partition.
- Renamed the
decomposition value
"cross-classified" to
"crossed-dimensions" in
maihda() and compare_maihda_groups() (and in
the Shiny app’s decomposition toggle), because that mode crosses the
stratum dimensions’ main effects as random intercepts – whereas
“cross-classified MAIHDA” in the literature means the contextual
stratum-by-place model now fitted via context (see above).
The old value is accepted as a deprecated alias that
warns and maps to "crossed-dimensions". Note for code that
inspects results: a maihda_analysis from this mode now has
mode = "crossed-dimensions" (was
"cross-classified"), and
compare_maihda_groups() results carry
attr(, "decomposition") == "crossed-dimensions"; printed
output and plot titles say “crossed-dimensions” accordingly.
- Added
maihda(), a single high-level entry point that
runs the standard two-model MAIHDA workflow in one call. It fits the
null model (the formula you supply) and the
adjusted model (the same model plus the additive main
effects of the stratum dimensions), summarises the null model’s VPC/ICC,
and reports the PCV – the proportional change in
between-stratum variance from null to adjusted (the additive share of
the intersectional inequality). When a group is supplied it
also runs this decomposition within each group (the
compare_maihda_groups() result gains a pcv
column). It returns one consistent maihda_analysis object
with new
model_adjusted/summary_adjusted/pcv
slots (alongside the unchanged null-model
model/summary), and print,
summary, and plot methods; plot()
routes the VPC/shrinkage views to the null model and the
additive-vs-intersectional views (effect_decomp,
risk_vs_effect, ternary) to the adjusted
model, and gains type = "group_pcv". maihda()
is intrinsically a decomposition and has no single-model mode – use
fit_maihda() for a single fit. A numeric dimension
auto-binned for the strata enters the adjusted model as its
reconstructed tertile factor (not a linear term). Because it cannot
decompose without an intersection, maihda() errors (rather
than returning a half-result) when the stratum-defining variables are
unidentifiable (e.g. a hand-built stratum column) or define
only one dimension; the shorthand (1 | var1:var2) and
make_strata() paths both record the dimensions and
decompose normally.
- Added the
maihda_country_data dataset (OECD PISA 2018,
accessed via the learningtower package): 3,600 students
across six countries with gender x socioeconomic-status strata and
mathematics-achievement outcomes. It showcases
compare_maihda_groups() /
maihda(group = "country"), since intersectional inequality
(VPC/ICC) genuinely differs across the countries.
- Added the
maihda_sparse_data dataset and a new
vignette, “Bayesian MAIHDA for sparse intersections”,
showing why engine = "brms" is the safer estimator when
intersectional cells are small. The (simulated) data carry a
known interaction – 40% of the between-stratum variance, on a
Gaussian outcome y and a binary outcome event
– across 36 strata with deliberately skewed sizes (median 6, half the
cells below 5). Under that sparsity the maximum-likelihood (lme4)
crossed-dimensions fit is singular and over-shrinks the interaction (to
~23% Gaussian, ~3% binary – i.e. a real interaction read as “fully
additive”), with no uncertainty attached; a weakly-informative prior on
the random-effect SDs regularises the variance off the boundary and
returns a calibrated credible interval that covers the truth. The
vignette also documents the precompute-and-cache pattern for shipping
brms results in a build (Stan cannot run on CRAN’s / pkgdown’s
builders).
- Added
maihda_table(), a one-call
publication-ready results export that assembles the two
canonical MAIHDA write-up deliverables (cf. Evans et al. 2024’s
tutorial) from a fitted maihda() analysis (or a single
fit_maihda() model): (a) a model-results
table contrasting the null (Model 1) and adjusted (Model 2)
fits – intercept, between-stratum variance and SD, VPC/ICC, the PCV,
and, for a binary outcome, the AUC and Median Odds Ratio – and (b) a
ranked-strata table ordering every intersectional
stratum by its model-predicted outcome (their Table 4), with the
predicted value’s conditional interval, the stratum size, and the
stratum random effect. It computes nothing new – every quantity is read
from the summaries already attached to the analysis, so the table agrees
exactly with summary()/plot(). The
$models data frame is numeric and export-ready (statistics
in rows, an estimate + *_lower/*_upper
interval columns per model: the VPC bootstrap/posterior interval and the
bootstrap PCV interval are carried, other rows are point estimates), and
the print() method renders the familiar “estimate [low,
high]” layout plus the top/bottom strata (n_strata). It
adapts to every fit type: a crossed-dimensions analysis gets “Additive
share”/“Interaction share” rows instead of the PCV, a contextual
cross-classified analysis (context =) gets a “Context share
(VPC)” row, an ordinal fit’s thresholds stand in for the intercept, and
which = "adjusted" ranks the strata by the adjusted rather
than the null model. Works across the lme4, brms, wemix, and ordinal
engines.
summary() of a binomial/Bernoulli MAIHDA model – and
therefore maihda(), whose summaries it builds – now reports
the discriminatory accuracy automatically: the AUC /
C-statistic and Median Odds Ratio (the “DA” in MAIHDA), as a new
discriminatory_accuracy slot shown by the
print() methods, so the strata’s discriminatory accuracy
sits alongside the VPC without a separate call. For
maihda(), the headline print() shows the null
model’s AUC/MOR with the adjusted model’s AUC for comparison. The
response-scale (probability-scale) VPC is attached on request via
summary(model, response_vpc = TRUE, seed = ) or
maihda(..., response_vpc = TRUE, seed = ) (it is
simulation-based, hence opt-in and seeded). Both are computed from the
already-fitted model with no refit, and are skipped for non-binomial
outcomes and for the cross-classified fit (whose single-stratum
between-variance the MOR/response-VPC require is not defined across
crossed random effects). The standalone
maihda_discriminatory_accuracy(),
maihda_auc(), maihda_mor(), and
maihda_vpc_response() are unchanged.
- Added
maihda_ic(), which surfaces relative-fit
information criteria for choosing between model
structures (different covariate sets, strata definitions, or
families) – the question the VPC/PCV do not answer. It reports
AIC/BIC for the likelihood engines (lme4,
ordinal clmm) and the Bayesian
WAIC/LOOIC for brms, takes one or more
maihda_models (or a maihda_analysis, which
expands into its null and adjusted rows), and adds a delta
column (gap from the best model on the primary criterion). Crucially it
handles the REML pitfall: lmer fits
Gaussian models by REML, whose AIC/BIC are not comparable
across models with different fixed effects (the canonical
null-vs-adjusted MAIHDA case), so when more than one model is supplied
any REML fit is refitted with maximum likelihood via
lme4::refitML() before the criteria are read – matching
what anova() does on lme4 models – and the
estimator column records it. Design-weighted
(wemix) pseudo-likelihood fits report NA (no
standard AIC/BIC is defined). compare_maihda() now appends
these criteria alongside the VPC/ICC by default
(ic = TRUE); set ic = FALSE for the lean
VPC-only table.
stepwise_pcv() now also reports the
discriminatory-accuracy trajectory for a binary
(binomial/Bernoulli) outcome, alongside the between-stratum-variance /
PCV trajectory it already produced – the stepwise
discriminatory-accuracy analysis of Merlo et al. (2016). Each step gains
an AUC (that model’s C-statistic; step 0 is the strata-only
discriminatory accuracy), Step_AUC and
Total_AUC (the absolute change in AUC –
delta-AUC – versus the previous step and versus the null, in contrast to
the proportional
Step_PCV/Total_PCV), and MOR (the
Median Odds Ratio, logit link only) column. No extra models are fit: the
columns are read off each step’s already-fitted model via
maihda_discriminatory_accuracy(), so a design-weighted
stepwise (sampling_weights) reports the design-weighted
AUC, and the columns are simply absent for non-binary outcomes (the
gaussian/poisson/ordinal table is unchanged). A new print()
method for the maihda_stepwise table notes the
proportional-vs-absolute distinction.
- Added
maihda_interactions(), a diagnostic that flags
which strata carry a credibly non-zero intersectional
interaction – the heart of “where is there genuine
intersectionality”. It reads each stratum’s interaction BLUP (the
stratum random effect of the adjusted /
crossed-dimensions model, i.e. the departure from the additive
main-effects prediction) and flags the strata whose effect is credibly
different from zero, returning a ranked maihda_interactions
table (flagged strata first, by interaction magnitude). For the
frequentist engines
(lme4/wemix/ordinal) it uses the
BLUP’s conditional standard error with an optional multiplicity
correction (adjust, default "none"; the docs
steer to FDR "BH" for screening many strata, with the full
set of p.adjust methods available for a reviewer who needs
a specific one); for brms it uses the exact
posterior tail – a credible interval and the probability of
direction pd = P(BLUP > 0) – rather than a normal
approximation, and adjust is inert (the Bayesian answer is
multiplicity-free). Passing a maihda() analysis uses the
right (adjusted) model automatically; passing a bare null model is
caught with a warning, since its stratum effects mix the additive and
interaction parts. The help explains why a correction is optional on
already-shrunken BLUP estimates (Gelman, Hill & Yajima 2012; Gelman
& Carlin 2014).
plot() gains a highlight_interactions
argument (on both a maihda_model and a
maihda() analysis) that focuses and stars the
maihda_interactions()-flagged strata on the BLUP-based
views (effect_decomp, predicted,
obs_vs_shrunken). Pass TRUE (flags computed
with defaults), a multiple-testing method such as "BH", or
a precomputed maihda_interactions object to reuse a
specific conf_level/adjust; for an analysis
the flags are computed once from the adjusted model and reused across
views. In effect_decomp, labels follow the selected flagged
set, so a BH screen labels only strata that survive the BH adjustment.
FALSE (default) leaves every plot unchanged.
Improvements
maihda_mor() now requires the logit
link, not merely the binomial family. The Median Odds Ratio is an
odds-ratio-scale quantity derived from the logistic latent variance, so
applying its exp(sqrt(2 * V_A) * qnorm(0.75)) formula to a
binomial(link = "probit") fit returned a number that was
not on the model’s scale. maihda_mor() now errors for a
non-logit binomial link, and
maihda_discriminatory_accuracy() reports the
(link-agnostic) AUC with mor = NA for such fits, recording
the link and explaining the NA in its print()
method.
- Clarified that
maihda_vpc_response() collapses the
fixed part to its mean linear predictor before simulating, so for an
adjusted (covariate) model the response-scale VPC is a
conditional-at-mean estimate (evaluated at the average covariate
profile), not one marginalised over the covariate distribution. It is
exact for the canonical null/strata-only model; the documentation now
states this and recommends reading it from the null model.
- Updated the “MAIHDA for binary outcomes” vignette, which still
claimed the package shipped no AUC/MOR helper and defined a local one.
It now uses the exported
maihda_discriminatory_accuracy(),
maihda_auc(), and maihda_mor(), and points to
maihda_vpc_response() for the response-scale VPC.
- Added the missing
shinycssloaders dependency to the
README’s interactive-dashboard list (it is in DESCRIPTION
Suggests and used by run_maihda_app()).
- Plotting is now unified under the base
plot() generic.
compare_maihda() and
compute_maihda_ternary_data() now return classed objects,
so plot() dispatches automatically:
plot() on a compare_maihda() result (was
plot_comparison())
plot() on a compare_maihda_groups()
result, with type = "vpc"/“components” (was
plot_group_comparison())
plot() on a compute_maihda_ternary_data()
result (was plot_maihda_ternary())
- The old
plot_comparison(),
plot_group_comparison(), and
plot_maihda_ternary() functions still work but are
deprecated and emit a one-time warning pointing to
plot().
fit_maihda() now records lme4 fit-quality diagnostics
(singular fit and convergence warnings) on the model object;
print() on the model and summary() surface a
“Fit diagnostics” note so a boundary/non-converged fit – which makes the
VPC/PCV unreliable – is no longer silent.
compare_maihda_groups() now raises a single aggregated
warning naming any group whose lme4 fit was singular or failed to
converge (per-group fits on small strata are where this is most likely),
so an unreliable per-group VPC – often pinned at 0 by a boundary fit –
is no longer silent.
- The
"risk_vs_effect" plot no longer frames the outcome
axis as “risk”. A higher predicted value is not universally “bad” (it
depends on the outcome), so the axis and title now read as a neutral
“mean predicted value/probability”, with a note that the direction
depends on the outcome.
- Clarified the documentation of
plot_prediction_deviation_panels() to match the
implementation: the labelled points use a per-type metric – deviation
from the mean prediction for Gaussian/Poisson (and the ordinal
expected-score mode), the absolute deviance residual for binomial, and
surprise for the ordinal surprise mode – and are not statistical
outliers or model-misfit “deviants”.
- Clarified that the per-group VPC/ICC in
compare_maihda_groups() is the between-stratum
share of variance, which can differ across groups because of
the residual variance as well as the between-stratum variance. The
documentation now points to the var_between column for
comparing the absolute amount of intersectional variation, and notes
that overlap of separate per-group intervals is not a valid test of
whether two groups’ VPCs differ.
plot() on a compare_maihda_groups() result
gains type = "between_variance" (and plot() on
a maihda(group = ) analysis gains
type = "group_between_variance"), which bars the absolute
between-stratum variance by group – the magnitude of
intersectional variation that the VPC share cannot convey. The
VPC plot now also carries an interpretive caption (it is descriptive,
and overlapping intervals are not a difference test), and all group
plots now name any groups omitted because their VPC was not estimable
instead of dropping them silently.
- Clarified the PCV documentation (
calculate_pvc(),
stepwise_pcv(), and the print method): the PCV is a
model-dependent change in between-stratum variance and equals variance
“explained” only when the second model nests the first; the stepwise PCV
is order-dependent and not a variable’s unique contribution. The
vignette and Shiny app no longer describe PCV as variance causally
“explained” by main effects or treat a negative PCV as evidence of
hidden structural inequality.
- Corrected the
summary() VPC/ICC documentation: the
denominator is the total unexplained variance, which includes the
variance of any additional random effects (not just between-stratum +
residual), and a note on the weighted-Gaussian level-1 variance was
added.
- Documented which families the MAIHDA variance summaries support
(
gaussian("identity"), binomial/Bernoulli with
logit/probit, poisson("log")); other families such as
Gamma(link = "log") will fit but
summary()/VPC/PCV will stop with a clear “not implemented”
error rather than returning an invalid number.
- README clarifications: a note that the CRAN release can lag this
repository (so the newest features may require the GitHub development
version), and the interactive-dashboard dependency list now includes
future, promises, and haven (for
SPSS/Stata uploads).
- Completed the PCV wording cleanup in the remaining vignette and
Shiny-app text (the prior pass missed several spots), and softened
over-strong app labels: the quadrant plot is “Mean Prediction
vs. Stratum Effect” (not “Risk vs. Intersectional Effect”), the
cumulative-PCV chart is “Change in Between-Stratum Variance” (not
“Variance Explained”), and “deviant strata” is now “most extreme
strata”.
- Removed the stale checked-in rendered vignette HTML
(
vignettes/*.html); these are build artifacts generated
from the .Rmd sources and had drifted out of sync with the
corrected text. They are now git-ignored and added to
.Rbuildignore, so a locally rendered HTML is never shipped
in the tarball and R CMD build regenerates inst/doc from
the .Rmd.
- Fixed an invalid documented URL flagged by
R CMD check --as-cran: the maihda_country_data
@source linked to
https://www.oecd.org/pisa/data/, which returns HTTP 403. It
now links to the CRAN page of the learningtower package
(the reproducible access route used to build the dataset), keeping the
OECD PISA 2018 attribution in the text.
- The
data-raw/maihda_health_data.R regeneration script
no longer calls install.packages("NHANES") as a side
effect; like the country-data script it now stops with a clear message
asking the developer to install the dependency.
make_strata() (and the prediction-time stratum lookup)
now matches rows to strata with a single vectorised, collision-safe key
match instead of an O(rows x strata x variables) row-by-row scan, so it
scales to large survey datasets. Behaviour is unchanged, including the
correct handling of values that contain the stratum-label
separator.
Bug Fixes
calculate_pvc() and compare_maihda() now
apply their analytic-sample identity checks to design-weighted
(wemix) fits. A WeMixResults object
exposes no nobs()/model.frame(), so the
row-count, row-identity and outcome-fingerprint guards previously fell
through to a silent pass: two wemix fits sharing a formula,
n and strata but fit to completely different
outcome values compared as if they were the same analytic
sample (calculate_pvc() returned a PCV,
compare_maihda() reported VPCs, neither warned). The guards
now fall back to the wrapper’s stored analytic $data (the
rows the engine actually fit), so a mismatched outcome is caught –
calculate_pvc() errors and compare_maihda()
warns – exactly as for the lme4 path. Genuinely matched wemix fits are
unaffected.
- The longitudinal PCV baseline is now the
between-stratum variance at the observed baseline time
(
ref_time = min(time)), not the raw time-0 random-intercept
variance. maihda_longitudinal_pcv() (and the
maihda_long_pcv print method) evaluated
Sigma[1, 1] directly, which is the variance at
time = 0 – correct only when time is centred so the
baseline is 0. For a model whose time does not start at zero (e.g. waves
10:12) this reported the PCV of an extrapolated time-0 variance, which
is meaningless and can even be negative; the baseline PCV is now
a(t)'Sigma a(t) evaluated at ref_time,
matching how summary() reports the baseline VPC. The
slope-variance PCV is unchanged (it is invariant to where time is
zeroed).
- Cross-sectional, single-value-per-stratum summaries are now
refused for longitudinal (growth-curve) models rather
than silently returning a misleading scalar. A growth model’s stratum
estimand is a trajectory (random intercept + slope), so the
scalar BLUP that
maihda_table()’s ranked-strata table and
plot(type = "predicted" / "obs_vs_shrunken" / "risk_vs_effect" / "effect_decomp" / "prediction_deviation" / "ternary")
build (which adds only the random intercept and drops the slope)
misrepresents it. These paths now error – or, for
maihda_table(), omit the ranking with an explanatory note –
and point to the trajectory tools
(predict(type = "strata"),
plot(type = "trajectories"),
plot(type = "vpc_trajectory")). summary()’s
stratum-estimates print is relabelled “baseline (intercept) deviations”
for a longitudinal fit, with the same pointer.
maihda_ic() now reports the analytic sample size
n for a design-weighted (wemix)
fit instead of NA. WeMixResults has
no nobs() method, so the IC table fell back to
NA; it now uses nrow(model$data) (the rows the
fit used), matching the nobs that glance()
already reports for the same model.
- Corrected the PCV for Gaussian
lmer models (the
REML pitfall). The proportional change in between-stratum
variance compares the stratum variance of a null and an
adjusted model that differ in their fixed effects (the adjusted
model adds the dimensions’ additive main effects). lmer
fits Gaussian models by REML, and a REML variance estimate is
not comparable across models with different fixed
effects, so the PCV was biased – it overstated the adjusted model’s
residual between-stratum variance and therefore understated the
additive share (the PCV). calculate_pvc() – and
hence maihda()’s PCV, stepwise_pcv(), and the
per-group PCV in compare_maihda_groups() – now refits any
REML lmer model with maximum likelihood
(lme4::refitML()) before reading the variances (and before
the parametric bootstrap, so the interval matches), exactly as
maihda_ic() already does for AIC/BIC and as
anova() does on lme4 models. In simulations
with a known 60% additive share (40% interaction), the reported PCV
moved from ~50% (biased, REML) to ~58–60% (ML), recovering the truth.
glmer (GLMM) fits, the brms/wemix/ordinal engines,
single-model VPC/ICC summaries (which correctly stay REML, being
comparison-free), and singular/boundary fits are unaffected. In
compare_maihda_groups(), var_between_adjusted
is now reported as var_between * (1 - pcv) so it shares the
REML scale of the VPC’s var_between and the table stays
internally coherent.
- VPC/ICC is no longer reported for a Gaussian model fit with a
non-identity link (e.g.
gaussian(link = "log")). The
residual variance is on the response scale while the between-stratum
variance is on the link scale, so summary(),
compare_maihda(), and calculate_pvc() now
raise a clear error instead of silently returning an invalid variance
partition.
- Binary-outcome auto-detection now keys off the analytic model frame
– after applying covariate transformations (e.g.
log(x)),
dropping rows with missing values, dropping rows with a missing prior
weight, and applying any subset= (including
negative/character row indices) – instead of the raw outcome column. An
outcome that is only 0/1 once excluded rows are removed is now correctly
fit with family = "binomial".
- Data-masked engine arguments forwarded through
...
(e.g. weights=, subset=, offset=)
now work at any nesting depth, including through the
maihda() and compare_maihda_groups() wrappers.
Arguments are captured as quosures and resolved with the data mask
(rlang::eval_tidy), fixing the previous “object not found”
/ “..1 used in an incorrect context” failures (a direct
fit_maihda() call worked, but the wrappers did not).
- A binary outcome is now recoded to 0/1 in a way that no longer
breaks a
subset= expression referencing the original
response labels (e.g. subset = y %in% c("no", "yes")): the
subset is evaluated against the original response before recoding.
compare_maihda_groups() now slices forwarded
weights=/subset=/offset= to each
group’s rows before fitting, not just for the row-count guard. An
external (non-column) weights vector previously failed
every group with a length mismatch, and an external subset
vector could be recycled onto the wrong rows of later groups; both now
align correctly per group.
- The Gaussian VPC/ICC now accounts for prior
weights.
With weights the per-observation residual variance is
sigma^2 / w_i, so the level-1 variance reported is the mean
conditional residual variance mean(sigma^2 / w_i) rather
than a single sigma^2; unweighted models are
unchanged.
plot_effect_decomposition() now uses the stratum random
effect (BLUP) itself as the intersectional component instead of (full
prediction - fixed prediction). With additional random effects such as
(1 | site) the latter wrongly absorbed those effects into
the stratum component.
- The stratum-level “surprise” in
plot_prediction_deviation_panels() (ordinal, surprise mode)
is now the average per-observation surprise, mean(-log(p))
(log loss), instead of -log(mean(p)), which could change
stratum rankings.
- The Shiny app (
run_maihda_app()) now also auto-detects
a numeric 0/1 outcome and fits it as binomial under the default family,
matching the core API, instead of silently fitting a linear probability
model.
compare_maihda_groups() now warns when groups end up
with different populated strata even under
shared_strata = TRUE, since their VPCs are then estimated
over different stratum support and are not strictly directly
comparable.
plot_prediction_deviation_panels() now plots
Poisson/count models on the response (expected-count) scale with count
labels, rather than routing them through the Gaussian link-scale
branch.
compare_maihda_groups(min_group_n = ...) now guards the
analytic sample size (the rows the model actually fits) rather than the
raw group row count, so a group with enough raw rows but a tiny usable
sample is skipped instead of being fit on a handful of
observations.
predict_maihda(type = "strata") now respects
newdata: it returns only the strata present in
newdata (and errors on a stratum the model never saw, as
type = "individual" does) instead of always returning every
training stratum. With newdata = NULL it still returns all
strata.
compare_maihda_groups() now counts populated strata for
the pre-fit guard on the analytic model frame, not the raw subgroup. A
group with two raw strata but only one stratum left after missing-row
removal is now cleanly skipped as VPC-undefined instead of failing
during fitting with “grouping factors must have > 1 sampled
level”.
n_boot for bootstrap intervals must now be at least 10
(the minimum number of successful refits an interval requires); an
unusably small n_boot fails immediately with a clear
message instead of only erroring after the bootstrap runs.
calculate_pvc() and compare_maihda() now
detect differing prior weights: previously two models fit
to the same rows/outcome/strata but with different weights compared as
if equivalent (PCV returned silently, no warning), even though weights
change the variance estimates. calculate_pvc() now errors
and compare_maihda() warns; an unweighted fit and an
explicit unit-weight fit are still treated as equal.
- When a two-level non-0/1 outcome is recoded to 0/1, the chosen
mapping (which level becomes the modeled event = 1) is now reported via
a
message() and stored on the model as
$response_recoding. Previously the mapping followed
alphabetical (character) or declared (factor) level order silently, with
no signal at all when family = "binomial" was passed
explicitly, so the modeled event could be inverted unnoticed. The 0/1
assignment rule is unchanged (it matches base glm).
- Stratum-level plot aggregations
(
plot(type = "predicted"), "risk_vs_effect",
"obs_vs_shrunken", "effect_decomp") now honour
prior weights, collapsing per-stratum predictions with a
prior-weight-weighted mean (and weighting the reference lines by the
summed weights). This makes the plots consistent with the weighted
Gaussian VPC for survey/weighted fits; unweighted models are unaffected,
and aggregated-binomial (cbind) fits, whose
weights(type = "prior") are the trials, are left unweighted
to avoid double-counting.
- The Shiny app’s “Compute Bootstrap CIs” control now actually
produces the VPC/ICC bootstrap intervals it (and the vignette)
advertise. The bootstrap was previously applied only to the PCV, so the
headline VPC/ICC was shown as a point estimate with no interval. The
interval is now computed in the background worker (keeping the UI
responsive) and displayed alongside the VPC/ICC in the Model Summary and
Interactive Explorer tabs; the PCV interval in the PCV Results tab is
unchanged.
- The Shiny app (
run_maihda_app()) no longer aborts the
whole analysis when the baseline (null) model has zero or negative
between-stratum variance (a singular / no-between-variation fit), which
makes calculate_pvc() error by design. The fitted model,
VPC/ICC, summaries, and plots are now shown as usual and the PCV is
reported as unavailable (with the underlying reason) instead of failing
the entire workflow.
compare_maihda_groups() now accepts a bare family
function (e.g. family = stats::gaussian) – one of
the documented forms (“as in fit_maihda”), alongside a
family name ("gaussian") and a family object
(gaussian()). The per-group fits already handled it; only
the family metadata recorded on the result did not, so the call fit
every group and then failed with “object of type ‘closure’ is not
subsettable”. The family function is now resolved to a family object up
front, exactly as fit_maihda() does.
Diagnostics
brms fits now record MCMC convergence diagnostics
(maximum Rhat and the number of divergent transitions) alongside the
engine, surfaced in the “Fit diagnostics” block of
print()/summary() so a non-converged or
divergent Bayesian fit is no longer silent.
- Bootstrap VPC/PCV intervals now report Monte Carlo error:
summary() and calculate_pvc() print the number
of successful bootstrap draws and the Monte Carlo standard error so the
precision of an interval can be judged.
MAIHDA 0.1.10
New Features
- Added
compare_maihda_groups() to compare intersectional
inequality (VPC/ICC and between-/within-stratum variance) across levels
of a higher-level grouping variable such as country, region, or survey
wave. It fits a stratified MAIHDA model per group, by default using
shared/global strata so VPCs are directly comparable, with optional
per-group bootstrap confidence intervals.
- Added
plot_group_comparison() to visualize the result
either as a VPC-by-group forest plot or as stacked variance-composition
bars.
Bug Fixes
- Fixed parametric-bootstrap confidence intervals for VPC
(
summary()) and PVC (calculate_pvc()): failed
refit() iterations were silently recorded as 0
instead of being dropped, biasing intervals toward zero and suppressing
the high-failure-rate warning. Failed iterations are now excluded
correctly.
- Corrected the Poisson VPC residual-variance approximation to
log(1 + 1/mu) (Stryhn et al. 2006); the previous
1/mu linearization biased the VPC downward for low-count
outcomes.
plot_prediction_deviation_panels() no longer draws
zero-width “95% CI” bars when the underlying model does not supply
standard errors (e.g. lme4::merMod); intervals are omitted
instead of collapsed.
MAIHDA 0.1.9
Bug Fixes
- Clarified the Shiny dashboard PVC/HUD interpretation so negative PVC
values are shown as variance unmasking rather than as unexplained
interaction variance.
- Fixed the coverage workflow failure-artifact upload
configuration.
MAIHDA 0.1.8
General Updates & New
Features
- Added
plot_prediction_deviation_panels() function for
visualizing predicted values and identifying deviant cases.
- Added
plot_risk_vs_effect() function to create a
quadrant scatterplot comparing overall marginal predicted risk against
pure intersectional effects.
- Added
plot_effect_decomposition() function to visually
decompose the total deviation from the overall mean into additive and
intersectional components.
- Replaced the redundant “caterpillar” plot with the “predicted” plot
in
plot() and the interactive dashboard.
- Added automatic tertile binning (via an
autobin
parameter) for numeric grouping variables with more than 10 unique
values in make_strata().
- Updated the interactive Shiny Dashboard
(
run_maihda_app()) to include the new visualizations and a
toggle for auto-binning continuous strata variables.
- Added detection for binomial data.
fit_maihda() will
now automatically detect binomial outcomes and switch to the appropriate
family.
Bug Fixes
- VPC/ICC Calculation Fix: Corrected the residual
variance estimation for binomial and ordinal models. The package now
accurately applies the theoretical level-1 variance (\(\pi^2 / 3\) for
"logit" links
and \(1\) for "probit"
links) internally when summarizing models or bootstrapping the variance
partition coefficient, avoiding deflated VPC/ICC metrics.
MAIHDA 0.1.7
General Updates & New
Features
- Added
stepwise_pcv() function to sequentially estimate
proportional change in variance (PCV) by adding predictors
one-by-one.
- Added a fully-featured interactive Shiny Dashboard (via
run_maihda_app()) for visual data exploration, model
fitting, and performance visualization.
- Improved bootstrap methods for more efficient confidence interval
estimation.
- Added missing documentation block for the
maihda_sim_data dataset to resolve R CMD check
warnings.
- Updated test suite setup:
tests/testthat.R was modified
to correctly use test_check("MAIHDA") instead of
shinytest2.
- Added
importFrom(stats, as.formula) for the
stepwise_pcv function to prevent undefined warnings.
- Updated
introduction.Rmd vignette: added standard CRAN
installation instructions, and improved text clarity.
MAIHDA 0.1.0
Initial Release
- Initial CRAN submission
- Added
make_strata() function for creating
intersectional strata
- Added
fit_maihda() function for fitting multilevel
models with lme4 (default) or brms engines
- Added
summary() function for variance partition and
stratum estimates
- Added
predict_maihda() function for individual and
stratum-level predictions
- Added
plot() function with three plot types:
- Caterpillar plots of stratum random effects
- Variance partition coefficient visualization
- Observed vs. shrunken estimates comparison
- Added
compare_maihda() function for comparing models
with bootstrap confidence intervals
- Added comprehensive documentation and vignettes
- Added unit tests for core functionality
Bug Fixes and Improvements
- Enhanced
make_strata() to properly handle missing
values (NA) in input variables:
- Observations with missing values in any stratum variable are now
assigned NA stratum
- Missing values are no longer included as valid stratum
categories
- Added comprehensive tests for missing value handling