Initial release: a comprehensive framework for measuring disclosure
risk and data utility of anonymized and synthetic data. All measures
share a consistent S3 API (print(), summary(),
plot()) and feed a multivariate Risk-Utility (R-U) map.
dcap()
(reports both the raw mean CAP and the differential CAP = mean CAP minus
baseline), tcap(), weap(),
disco().rapid() (Risk of Attribute
Prediction-Induced Disclosure; random-forest default, also
lm/cart/gbm/logit)
with confint(), permutation test, threshold selection,
synthesizer cross-validation, and six plot types.dcr(),
nndr(), ims(), repu(), including
the DCR-Delusion caveat and null-distribution diagnostics.domias(),
nnaa(), mia_classifier().kanonymity(), ldiversity()
(distinct/entropy/recursive), tcloseness() (EMD),
suda(), individual_risk(),
population_uniqueness() (Pitman/Zayatz/SNB),
epsilon_identifiability(), delta_presence(),
hitting_rate(), singling_out(),
linkability(), attacker_risk()
(prosecutor/journalist/marketer), drisk().recordLinkage() with
deterministic, probabilistic (Fellegi-Sunter), PRAM, predictive,
random-forest, RBRL, robust-Mahalanobis, and embedding (autoencoder)
methods; independent, bijective (Hungarian / GDBRL), and
optimal-transport (Sinkhorn) matching; blocking and per-record
accessors. All eight methods share a single re-identification-risk
definition — the probability of identifying the true match
within the attacker’s candidate set. For the random-forest and embedding
methods, the nearest-neighbour similarity (their former
risk value) is now retained in an
nn_similarity diagnostic column. na_anon
(ignore/match/mismatch) is
honored consistently across all methods (PRAM no longer reports an
artificial zero risk for records with a missing key). New options:
compute_baseline = TRUE reports the no-perturbation
reference risk (with risk_reduction), and
expected_risk = TRUE reports a perturbation-aware expected
PRAM risk over the transition distribution. User-supplied
m_probs/u_probs are validated and clamped to
the open interval (0,1).disclosure_report()
produces a comprehensive multi-metric report.propscore(),
pMSE(), specks().gower(), mqs(),
ci_overlap(), ci_proximity().compare_wasserstein(),
compare_ks_test(), compare_chisq_gof(),
compare_pca(), compare_embedding(),
compare_correlation_matrices(), hellinger(),
energy_distance(), mmd(),
copula_fidelity(), tail_fidelity(),
contingency_fidelity().tstr() (train on synthetic,
test on real), compare_feature_importance(),
compare_model_performance(),
regression_fidelity(),
subgroup_utility().KLDiv(), JSDiv(),
CrossEntropy(), entropy and mutual-information helpers,
privacy_score().rumap(): normalized multivariate R-U evaluation with
Pareto-frontier identification, internal-consistency metrics, and seven
visualizations (scatter, heatmap, dot plot, parallel coordinates,
radial, PCA biplot, blockwise PCA).synth_pair() container plus
from_synthpop() and from_simPop() converters;
most measures dispatch on synth_pair objects as well as
plain data frames.