---
title: "Mixed-Subjects 1PL Calibration"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Mixed-Subjects 1PL Calibration}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)
```

The 1PL (one-parameter logistic) model estimates a single shared discrimination
$a$ across all items together with per-item intercepts $d_j$:

$$P(x_j = 1 \mid \theta) = \text{logistic}(a\,\theta + d_j)$$

The parameter vector has length $J+1$ rather than $2J$. The package provides
exact analogues of the 2PL mixed-subjects functions for the 1PL case.

**When to prefer 1PL over 2PL:**
- Ability-focused tests where the items are designed to be equally discriminating.
- Tests built from a single item pool with homogeneous item characteristics.
- When the 2PL discrimination estimates are very noisy (small $n$).

> **Note on vcov.** `vcov_mixed_subjects_1pl()` currently uses the EM
> complete-data Hessian (not Louis' marginal-information correction). The
> uncertainty estimates are slightly over-precise. A Louis-corrected 1PL bread
> is planned for a future release.

## Simulate a 1PL test

```{r simulate}
library(mixedsubjectsirt)
library(ggplot2)

set.seed(2026)

n_human     <- 400
n_generated <- 1200
n_items     <- 8

# True 1PL: shared discrimination a = 1.2, varying difficulties
true_1pl <- data.frame(
  item = paste0("Item", seq_len(n_items)),
  a    = 1.2,
  d    = seq(-1.1, 1.1, length.out = n_items)
)
true_1pl$b <- -true_1pl$d / true_1pl$a

theta_human <- rnorm(n_human)
observed    <- simulate_2pl(theta_human, true_1pl)

# LLM: same 1PL structure, small intercept shift
llm_1pl   <- true_1pl
llm_1pl$d <- true_1pl$d + 0.25
llm_1pl$b <- -llm_1pl$d / llm_1pl$a

predicted <- simulate_2pl(theta_human, llm_1pl)
generated <- simulate_2pl(rnorm(n_generated), llm_1pl)
```

## Step 1: Fit the 1PL baseline

`fit_1pl()` estimates $a$ and $d_1, \ldots, d_J$ by maximizing the IRT marginal
likelihood under a standard-normal ability prior.

```{r fit-1pl}
fit1 <- fit_1pl(observed, n_quad = 15)
cat("Shared a:", round(fit1$pars$a[1], 3), " (true:", true_1pl$a[1], ")\n")
cat("Convergence:", fit1$convergence, "\n\n")
fit1$pars
```

All items in the output have the same `a` value, confirming the 1PL constraint.

## Step 2: Fit mixed-subjects MML (1PL)

`fit_mixed_subjects_mml_1pl()` uses the true marginal likelihood with a
1PL-specific gradient: the shared discrimination gradient accumulates
contributions from all $J$ items, while each intercept has its own gradient.

```{r mml-1pl}
fit_mml_1pl <- fit_mixed_subjects_mml_1pl(
  observed     = observed,
  predicted    = predicted,
  generated    = generated,
  lambda       = 0.5,
  initial_pars = fit1$pars,
  n_quad       = 15,
  control      = list(maxit = 300)
)

print(fit_mml_1pl)
fit_mml_1pl$item_pars
```

## Step 3: Correct covariance — $(J+1) \times (J+1)$ sandwich

`vcov()` dispatches to `vcov_mixed_subjects_1pl()` for 1PL fits, returning a
$(J+1) \times (J+1)$ matrix with `a_shared` and per-item `d_j` as rows/columns.

```{r vcov-1pl}
Sigma_1pl <- vcov(fit_mml_1pl)
dim(Sigma_1pl)
rownames(Sigma_1pl)
```

## Step 4: Ability-score risk and lambda tuning

`tune_lambda_ability_risk_1pl()` uses the 1PL-parameterized gradient
$\partial\hat\theta / \partial (a_\text{shared}, d_1, \ldots, d_J)$ for the
ability-score risk. The chain rule gives
$\partial\hat\theta / \partial a_\text{shared} = \sum_j \partial\hat\theta / \partial a_j$.
As in the 2PL version, $\lambda$ is chosen by direct 1-D optimization by default
(pass `method = "grid"` to scan a grid instead).

```{r tune-1pl}
tuned_1pl <- tune_lambda_ability_risk_1pl(
  observed     = observed,
  predicted    = predicted,
  generated    = generated,
  initial_pars = fit1$pars,
  n_quad       = 15,
  control      = list(maxit = 300)
)

tuned_1pl$best_lambda
```

## Step 5: Verify — F = Y gives lambda > 0

With `predicted = observed` (perfect paired predictor), the ability-risk criterion
should select a positive lambda.

```{r perfect-pred-1pl}
tuned_fy <- tune_lambda_ability_risk_1pl(
  observed     = observed,
  predicted    = observed,     # F = Y
  generated    = simulate_2pl(rnorm(n_generated), true_1pl),
  initial_pars = fit1$pars,
  n_quad       = 15,
  control      = list(maxit = 300)
)

cat("F=Y best lambda:", tuned_fy$best_lambda,
    " (theory: N/(n+N) =", round(n_generated / (n_human + n_generated), 3), ")\n")
```

## Compare 1PL and 2PL

On a well-specified 1PL test, how do the 1PL and 2PL estimators compare?

```{r compare-1pl-2pl}
fit_2pl_mml <- fit_mixed_subjects_mml(
  observed     = observed,
  predicted    = predicted,
  generated    = generated,
  lambda       = tuned_1pl$best_lambda,
  initial_pars = fit_2pl(observed, technical = list(NCYCLES = 500))$pars,
  n_quad       = 15,
  control      = list(maxit = 300)
)

rmse <- function(x, y) sqrt(mean((x - y)^2))
cat("1PL RMSE(a):", round(rmse(tuned_1pl$best_fit$item_pars$a, true_1pl$a), 4), "\n")
cat("2PL RMSE(a):", round(rmse(fit_2pl_mml$item_pars$a, true_1pl$a), 4), "\n")

# Difficulty recovery
cat("1PL RMSE(d):", round(rmse(tuned_1pl$best_fit$item_pars$d, true_1pl$d), 4), "\n")
cat("2PL RMSE(d):", round(rmse(fit_2pl_mml$item_pars$d, true_1pl$d), 4), "\n")
```

The 1PL uses fewer parameters ($J+1$ vs $2J$), which can give lower RMSE on a
test generated from a true 1PL DGP — especially for small $n$.

## Ability-score risk: 1PL vs 2PL parameterization

The 1PL ability-score risk is smaller in the $(J+1)$-parameter space because the
shared $a$ concentrates all discrimination information in a single parameter.

```{r risk-compare}
Sigma_2pl <- vcov(fit_2pl_mml)  # 2J × 2J Louis-corrected

risk_1pl <- ability_risk_1pl(observed, tuned_1pl$best_fit)
risk_2pl <- ability_risk(observed, fit_2pl_mml, vcov = Sigma_2pl)

cat("1PL mean param_var:", round(risk_1pl$summary$mean_param_var, 5), "\n")
cat("2PL mean param_var:", round(risk_2pl$summary$mean_param_var, 5), "\n")
```