---
title: "Worked examples"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Worked examples}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This example gives a complete, runnable `bpgmm` analysis. It is small enough to
build quickly on CRAN and pkgdown, but it uses the same functions as a larger
analysis. The focus is the structure of one fitted object; model-selection
interpretation is handled separately.

## Simulate a small clustering problem

`pgmm_rjmcmc()` expects a numeric matrix with variables in rows and observations
in columns. The code below creates two well-separated clusters in two
dimensions.

```{r}
library(bpgmm)

set.seed(2026)

X <- cbind(
  matrix(rnorm(8, mean = -2, sd = 0.2), nrow = 2),
  matrix(rnorm(8, mean = 2, sd = 0.2), nrow = 2)
)
known_labels <- rep(1:2, each = 4)

dim(X)
known_labels
```

```{r, fig.width = 5.5, fig.height = 4, fig.alt = "Scatter plot of simulated observations colored by known cluster."}
cluster_cols <- c("#0072B2", "#D55E00", "#009E73")
plot(
  X[1, ], X[2, ],
  col = cluster_cols[known_labels],
  pch = 19,
  xlab = "Variable 1",
  ylab = "Variable 2",
  main = "Simulated data",
  asp = 1
)
legend(
  "topleft",
  legend = paste("Known", sort(unique(known_labels))),
  col = cluster_cols[sort(unique(known_labels))],
  pch = 19,
  bty = "n"
)
```

## Fit a fixed-model chain

This fit keeps \(m\) and the covariance model fixed, so the output structure is
explicit. The fitted likelihood contribution is

\[
p(x_i \mid z_i = k, \Theta) =
N_p(x_i \mid \mu_k, \Lambda_k\Lambda_k^\top + \Psi_k).
\]

For model selection across \(m\) and covariance labels, use the separate
model-selection vignette.

```{r}
fit_log <- capture.output({
  fit <- pgmm_rjmcmc(
    X = X,
    m_init = 2,
    m_range = c(1, 3),
    q_new = 1,
    burn = 1,
    niter = 3,
    constraint = "UUU",
    m_step = 0,
    v_step = 0,
    verbose = FALSE
  )
})
tail(fit_log, 1)
```

The returned object stores posterior samples for allocations, covariance
constraints, component parameters, and hyperparameters.

```{r}
names(fit)
length(fit$allocation_samples)
```

## Summarize posterior samples

`summarize_pgmm_rjmcmc()` returns the modal allocation, posterior counts for the
number of clusters, posterior counts for covariance models, and optionally an
adjusted Rand index.

```{r}
fit_summary <- summarize_pgmm_rjmcmc(fit, true_cluster = known_labels)

fit_summary$allocation
fit_summary$n_clusters
fit_summary$n_constraints
fit_summary$ari
```

The modal allocation is computed coordinate-wise from the saved posterior
allocations:

\[
\hat{z}_i = \operatorname{mode}\{z_i^{(1)},\ldots,z_i^{(S)}\},
\]

where \(S\) is the number of saved posterior samples.

```{r, fig.width = 7, fig.height = 4, fig.alt = "Two-panel plot comparing known labels and posterior allocation."}
old_par <- par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))
plot(
  X[1, ], X[2, ],
  col = cluster_cols[known_labels],
  pch = 19,
  xlab = "Variable 1",
  ylab = "Variable 2",
  main = "Known labels",
  asp = 1
)
plot(
  X[1, ], X[2, ],
  col = cluster_cols[fit_summary$allocation],
  pch = 19,
  xlab = "Variable 1",
  ylab = "Variable 2",
  main = "Posterior allocation",
  asp = 1
)
par(old_par)
```

## Inspect component parameters

The sampler stores draws of \(\tau_k\), \(\mu_k\), \(\Lambda_k\), and
\(\Psi_k\). For this fixed-model example, the final saved draw is enough to
show the object shape.

```{r}
last <- length(fit$tau_samples)

fit$tau_samples[[last]]
fit$mean_samples[[last]]
lapply(fit$lambda_samples[[last]], dim)
lapply(fit$psi_samples[[last]], dim)
```

## Scale the example to real data

For an applied analysis, keep the same sequence of steps but use larger sampler
settings. The exact values depend on data size and convergence diagnostics.

```{r, eval = FALSE}
fit <- pgmm_rjmcmc(
  X = your_data_matrix,
  m_init = 2,
  m_range = c(1, 10),
  q_new = 4,
  burn = 5000,
  niter = 15000,
  constraint = model_to_constraint("UUU"),
  m_step = 1,
  v_step = 1,
  split_combine = 1,
  verbose = FALSE
)

summarize_pgmm_rjmcmc(fit, true_cluster = known_labels)
```

## What to report

For a concise analysis report, include:

- the data orientation and preprocessing choices;
- `m_range`, `q_new`, starting covariance model, and whether `m_step`,
  `v_step`, and `split_combine` were enabled;
- the number of burn-in and posterior iterations;
- the posterior modal allocation or a co-clustering matrix;
- ARI only when a reference partition is scientifically meaningful.

The diagnostics vignette gives examples of trace plots and co-clustering
summaries for multiple chains.
