---
title: "Model and sampler details"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Model and sampler details}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This section connects the notation in Lu, Li, and Love (2021) to the `bpgmm`
interface. It describes the fitted model, the covariance labels, and the
RJMCMC switches used by the sampler. Full fitted examples are given in the
getting-started and worked-example vignettes.

## Observation model

Let \(X = (x_1, \ldots, x_n)\) be a matrix of continuous observations with
\(p\) variables and \(n\) observations. `bpgmm` expects this same orientation:
variables in rows and observations in columns.

For cluster \(k\), the mixture of factor analyzers model writes

\[
x_i = \mu_k + \Lambda_k y_{ki} + \epsilon_{ki},
\]

where

\[
y_{ki} \sim N(0, I_{q_k}), \qquad
\epsilon_{ki} \sim N(0, \Psi_k).
\]

After integrating out the latent factor \(y_{ki}\), the cluster-specific
covariance is

\[
\Sigma_k = \Lambda_k \Lambda_k^\top + \Psi_k.
\]

The plot below shows the geometric role of this covariance. The loading matrix
\(\Lambda_k\) controls the dominant low-dimensional direction, while
\(\Psi_k\) adds cluster-specific noise around that subspace.

```{r, fig.width = 5.5, fig.height = 4.5, fig.alt = "Two covariance ellipses illustrating cluster-specific covariance matrices."}
ellipse_points <- function(center, sigma, level = 0.90, n = 120) {
  eig <- eigen(sigma, symmetric = TRUE)
  angles <- seq(0, 2 * pi, length.out = n)
  circle <- rbind(cos(angles), sin(angles))
  radius <- sqrt(qchisq(level, df = 2))
  points <- t(center + radius * eig$vectors %*% diag(sqrt(eig$values), 2) %*% circle)
  colnames(points) <- c("x", "y")
  points
}

lambda_1 <- matrix(c(1.1, 0.4), ncol = 1)
lambda_2 <- matrix(c(0.2, 1.0), ncol = 1)
psi_1 <- diag(c(0.20, 0.08))
psi_2 <- diag(c(0.10, 0.25))
sigma_1 <- lambda_1 %*% t(lambda_1) + psi_1
sigma_2 <- lambda_2 %*% t(lambda_2) + psi_2

ell_1 <- ellipse_points(c(-1.2, -0.5), sigma_1)
ell_2 <- ellipse_points(c(1.1, 0.6), sigma_2)

plot(
  ell_1,
  type = "l",
  lwd = 2,
  col = "#0072B2",
  xlim = c(-3.5, 3.5),
  ylim = c(-2.5, 3),
  xlab = "Variable 1",
  ylab = "Variable 2",
  main = "Mixture-of-factor-analyzers covariance"
)
lines(ell_2, lwd = 2, col = "#D55E00")
points(rbind(c(-1.2, -0.5), c(1.1, 0.6)), pch = 19, col = c("#0072B2", "#D55E00"))
arrows(-1.2, -0.5, -1.2 + lambda_1[1], -0.5 + lambda_1[2], col = "#0072B2", lwd = 2, length = 0.08)
arrows(1.1, 0.6, 1.1 + lambda_2[1], 0.6 + lambda_2[2], col = "#D55E00", lwd = 2, length = 0.08)
legend(
  "topleft",
  legend = c("Cluster 1 covariance", "Cluster 2 covariance", "Loading direction"),
  col = c("#0072B2", "#D55E00", "gray30"),
  lwd = c(2, 2, 2),
  bty = "n"
)
```

The mixture density is

\[
f(x_i) = \sum_{k = 1}^m \tau_k
  N(x_i \mid \mu_k, \Lambda_k \Lambda_k^\top + \Psi_k),
\]

where \(\tau_k\) is the mixture weight and \(m\) is the number of clusters.
Equivalently, with allocation indicator \(z_i \in \{1,\ldots,m\}\),

\[
x_i \mid z_i = k, \Theta
  \sim N_p(\mu_k, \Sigma_k),
\qquad
\Pr(z_i = k \mid \tau) = \tau_k .
\]

In `pgmm_rjmcmc()`:

- `X` is the \(p \times n\) data matrix.
- `m_init` is the starting value of \(m\).
- `m_range` is the allowed range of \(m\).
- `q_new` is the factor dimension assigned to a newly proposed cluster.
- `constraint` is the starting covariance model.

## PGMM covariance constraints

The paper uses three letters to describe the covariance structure. Each letter
is either `C` for constrained or `U` for unconstrained.

| Letter | Meaning when `C` | Meaning when `U` |
|---|---|---|
| 1 | all clusters share one loading matrix \(\Lambda\) | each cluster has \(\Lambda_k\) |
| 2 | all clusters share one noise covariance \(\Psi\) | each cluster has \(\Psi_k\) |
| 3 | noise covariance is isotropic, \(\Psi_k = \psi_k I_p\) | noise covariance is diagonal |

The eight PGMM models are:

```{r}
library(bpgmm)

models <- c("CCC", "CCU", "CUC", "CUU", "UCC", "UCU", "UUC", "UUU")
data.frame(
  model = models,
  constraint = vapply(
    models,
    function(x) paste(model_to_constraint(x), collapse = ","),
    character(1)
  )
)
```

The package uses the numeric encoding internally: `1` means constrained and
`0` means unconstrained.

```{r}
model_to_constraint("UUU")
constraint_to_model(c(1, 1, 0))
```

## Priors and posterior updates

The supplement gives the natural conjugate priors used by the MCMC updates.
The main priors are:

\[
\tau \sim \mathrm{Dirichlet}(\gamma, \ldots, \gamma),
\]

\[
y_{ki} \sim N(0, I_{q_k}),
\]

\[
\mu_k \sim N(\bar{x}, \alpha_1^{-1} \Psi_k),
\]

\[
\lambda_{kj} \sim N(0, \alpha_2^{-1} \Psi_k),
\]

and

\[
\psi_{kj} \sim \mathrm{IG}(\delta, \beta).
\]

The complete parameter state can be read as

\[
\Theta =
\{\tau_k, \mu_k, \Lambda_k, \Psi_k, y_{ki}, z_i:
  k = 1,\ldots,m;\ i = 1,\ldots,n\},
\]

with hyperparameters

\[
H = (\alpha_1, \alpha_2, \beta, \delta, \gamma).
\]

The package samples the hyperparameters \(\alpha_1\), \(\alpha_2\), and
\(\beta\), with gamma hyperpriors controlled by:

- `d_vec`: shape parameters.
- `s_vec`: rate parameters.
- `delta`: inverse-gamma shape for the noise covariance.
- `ggamma`: Dirichlet concentration for mixture weights.

The allocation update uses the posterior probability

\[
p_{ki} =
\frac{
  \tau_k N(x_i \mid \mu_k, \Psi_k + \Lambda_k\Lambda_k^\top)
}{
  \sum_{\ell = 1}^{m}
  \tau_\ell N(x_i \mid \mu_\ell, \Psi_\ell + \Lambda_\ell\Lambda_\ell^\top)
}.
\]

Then

\[
(z_{i1}, \ldots, z_{im}) \mid X, \Theta, H
  \sim \mathrm{Multinomial}(1, p_{1i}, \ldots, p_{mi}).
\]

The allocation update uses this probability, and the joint posterior includes
the product of allocated mixture weights:

\[
p(Z \mid \tau) = \prod_{i = 1}^{n} \tau_{z_i}.
\]

## RJMCMC moves

RJMCMC is used because the number of parameters changes when the number of
clusters or the covariance constraint changes.

The cluster-number moves are:

| Move | Purpose |
|---|---|
| stay | update parameters without changing \(m\) |
| birth | add a new empty cluster |
| death | remove an empty cluster |
| split | split one occupied cluster into two clusters |
| combine | merge two occupied clusters |

The covariance-structure moves toggle one constraint at a time:

- toggle whether \(\Lambda_k\) is shared across clusters;
- toggle whether \(\Psi_k\) is shared across clusters;
- toggle whether \(\Psi_k\) is isotropic or diagonal.

In `pgmm_rjmcmc()`:

- `m_step = 1` enables birth/death moves for \(m\).
- `split_combine = 1` adds split/combine moves when `m_step = 1`.
- `v_step = 1` enables covariance-constraint moves.

For a proposed move from model \(M\) to \(M'\), the RJMCMC acceptance
probability has the standard form

\[
a =
\min\left\{
1,
\frac{
  p(X, Z, \Theta', H' \mid M')\,p(M')\,q(M \mid M')
}{
  p(X, Z, \Theta, H \mid M)\,p(M)\,q(M' \mid M)
}
\left|J\right|
\right\},
\]

where \(q(\cdot)\) is the proposal density and \(|J|\) is the Jacobian term
for dimension-changing moves such as split/combine. Birth/death and
split/combine change \(m\); covariance moves change the constraint label
\(v \in \{\mathrm{CCC}, \ldots, \mathrm{UUU}\}\).

## Interface mapping

The formula terms map to package output fields as follows:

| Paper notation | Package argument or output |
|---|---|
| \(X = (x_1,\ldots,x_n)\) | `X`, a \(p \times n\) numeric matrix |
| \(m\) | `m_init`, `m_range`, `active_cluster_samples` |
| \(q_k\) | `q_new` for newly proposed clusters |
| \(z_i\) | `allocation_samples`, `summary$allocation` |
| \(\tau_k\) | `tau_samples` |
| \(\mu_k\) | `mean_samples` |
| \(\Lambda_k\) | `lambda_samples` |
| \(\Psi_k\) | `psi_samples` |
| covariance model \(v\) | `constraint`, `constraint_samples` |

An applied model-selection call usually has the following structure:

```{r, eval = FALSE}
fit <- pgmm_rjmcmc(
  X = your_data_matrix,
  m_init = 5,
  m_range = c(1, 10),
  q_new = 4,
  burn = 5000,
  niter = 15000,
  constraint = model_to_constraint("UUU"),
  m_step = 1,
  v_step = 1,
  split_combine = 1,
  verbose = FALSE
)
```

## Citation

If you use these methods, cite:

Lu, X., Li, Y., & Love, T. (2021). On Bayesian Analysis of Parsimonious
Gaussian Mixture Models. *Journal of Classification*, 38, 576-593.
https://doi.org/10.1007/s00357-021-09391-8

In R:

```{r}
citation("bpgmm")
```