---
title: "Complete Workflow with hbsaems 1.0.0"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Complete Workflow with hbsaems 1.0.0}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
# Vignette evaluation strategy:
#
# By default we run lightweight diagnostic chunks (data inspection,
# convergence summaries, prediction tables) so that the vignette
# carries real output.  The model-fitting chunks themselves use
# `eval = FALSE` for two reasons:
#
#   1. Stan compile + warmup on a single chain takes ~20-60 seconds
#      on a fast machine and well over a minute on CRAN's slower
#      build agents.  With 4 model fits in this workflow the budget
#      is too tight to evaluate them all.
#   2. The vignette is meant as a reference / cookbook, not a
#      validation run -- the displayed code is what users should
#      paste into their own R session.
#
# Mini-fits used for the diagnostic chunks below are cached at the
# start of the vignette so subsequent chunks see a real hbmfit
# object without re-running Stan repeatedly.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  eval     = FALSE     # default: display-only
)

# Whether to actually run the mini-fit and diagnostic chunks.
# Set to FALSE in environments without a working Stan toolchain (e.g.
# CI containers that ship rstan but not the BH/Boost headers needed for
# Stan compile, or on CRAN's slow build agents where a few extra
# seconds matter).  The probe tries to compile a trivial Stan program
# in-memory; if that fails we silently fall back to display-only mode.
RUN_DIAGNOSTICS <- (function() {
  if (!requireNamespace("brms",  quietly = TRUE)) return(FALSE)
  if (!requireNamespace("rstan", quietly = TRUE)) return(FALSE)
  # Toolchain probe: try compiling the simplest Stan program possible.
  # If Boost / a C++ compiler is missing we'll get a clear error here
  # rather than later in a more confusing place.
  ok <- tryCatch({
    rstan::stan_model(
      model_code = "parameters { real x; } model { x ~ normal(0, 1); }",
      verbose = FALSE
    )
    TRUE
  }, error = function(e) FALSE,
     warning = function(w) FALSE)
  isTRUE(ok)
})()
```

```{r load-libs, eval = TRUE, message = FALSE, warning = FALSE}
library(hbsaems)
```

A typical Bayesian SAE workflow with `hbsaems` v1.0.0 follows seven
steps, each backed by a single primary function.  This vignette walks
through the canonical pipeline.

## See also

For deeper coverage of topics beyond this introductory workflow, see
the **articles on the package website**:

<https://madsyair.github.io/hbsaems/articles/>

The articles cover:

- Distribution-specific walkthroughs (Beta logit-normal, binomial
  logit-normal, lognormal-lognormal)
- Spatial models (CAR / SAR / BYM2)
- Handling missing data (deletion, multiple imputation, joint
  Bayesian imputation)
- The interactive Shiny dashboard
- Advanced features (custom families, benchmarking, smooth terms)
- AST-based formula manipulation (internals)
- Migration guide for users coming from earlier versions


## 0. Inspect the data

```{r data-inspect, eval = TRUE}
data("data_fhnorm")
str(data_fhnorm, max.level = 1)
head(data_fhnorm[, c("regency", "province", "y", "D", "x1", "x2", "x3")], 4)
```

`data_fhnorm` is a simulated Fay-Herriot dataset: 100 regencies
nested within 5 provinces, with a known sampling variance `D` per
area and three covariates.


## 1. Prior predictive check

```r
# This is the production call.  Replace `chains`, `iter` with your
# usual values; the lighter settings below are a quick demonstration.
model_prior <- hbm(
  formula           = brms::bf(y ~ x1 + x2 + x3),
  data              = data_fhnorm,
  re                = ~ (1 | regency),     # area-level random effect (Fay-Herriot)
  sampling_variance = "D",                 # KNOWN sampling variance D_i
  sample_prior      = "only",
  prior             = c(
    brms::prior(normal(0, 1),  class = "b"),
    brms::prior(normal(10, 5), class = "Intercept"),
    brms::prior(normal(0, 2),  class = "sd")
  ),
  chains = 2, iter = 1000
)
pc <- prior_check(model_prior,
                  data         = data_fhnorm,
                  response_var = "y")
plot(pc)
```

The `sampling_variance = "D"` argument is the **defining feature** of
a Fay-Herriot model: the sampling variance \eqn{D_i} is treated as
**known** from the design (it is not estimated), so brms pins the
observation-level \eqn{\sigma_i} to \eqn{\sqrt{D_i}}.  Omitting this
argument makes the model unidentified because the residual
\eqn{\sigma} and the area random-effect \eqn{\sigma_u} would compete
to explain the same variance, typically producing divergent
transitions and very wide credible intervals.


## 2. Fit the model

```r
model <- hbm(
  formula           = brms::bf(y ~ x1 + x2 + x3),
  data              = data_fhnorm,
  re                = ~ (1 | regency),
  sampling_variance = "D",
  control           = list(adapt_delta = 0.99),
  chains            = 4, iter = 4000, warmup = 2000, cores = 4
)
summary(model)
```

Two settings deserve attention:

* **`sampling_variance = "D"`** -- the column `D` in `data_fhnorm`
  holds the known sampling variance \eqn{D_i} for each area;
  `hbsaems` translates this into an offset on the observation-level
  standard deviation so brms / Stan never tries to estimate it.
* **`adapt_delta = 0.99`** -- Fay-Herriot likelihoods can produce
  mild funnel geometry between \eqn{\sigma_u} and the area random
  effects \eqn{u_i} when many areas have small \eqn{D_i}.  Raising
  `adapt_delta` from the brms default 0.95 to 0.99 buys a more
  conservative leapfrog step and eliminates the occasional
  divergent transition without re-parameterising the model.


### A small fitted model to demonstrate the rest of the workflow

For the remainder of the vignette we use a tiny model (very few
iterations) so that the diagnostic and prediction chunks below have
a real object to operate on.  In your own analysis, **use the full
production settings shown above**, not the toy settings here.

```{r mini-fit, eval = RUN_DIAGNOSTICS, message = FALSE, warning = FALSE, cache = TRUE}
# Mini fit -- iter = 200, chains = 1 -- for vignette demonstration only.
# Do NOT use these settings for inference: the chains have not
# converged at this length and the posterior will be biased.
fit_demo <- suppressWarnings(
  hbm(
    formula           = brms::bf(y ~ x1 + x2 + x3),
    data              = data_fhnorm,
    re                = ~ (1 | regency),
    sampling_variance = "D",
    chains  = 1,
    iter    = 200,
    warmup  = 100,
    refresh = 0,
    seed    = 1
  )
)
```


## 3. Convergence diagnostics

```{r diagnostics, eval = RUN_DIAGNOSTICS}
# Operate on the mini fit_demo above (NOT a substitute for production
# diagnostics on full chains).
diag <- convergence_check(fit_demo)
is_converged(fit_demo)
summary(diag)
hbm_warnings(fit_demo)
```

The full set of diagnostics on your production fit would also
include trace plots, R-hat / ESS tables, and pp-checks.  These all
follow the same calling convention:

```r
plot(diag, type = "trace")
plot(diag, type = "rhat")
plot(diag, type = "ess")
```


## 4. Goodness-of-fit

```r
gof <- model_compare(model)    # single-model: LOO, WAIC, pp_check
summary(gof)
plot(gof, type = "pp_check")
```


## 5. Compare alternatives

```r
model2 <- hbm(brms::bf(y ~ x1 + x2), data = data_fhnorm,
              re = ~ (1 | regency),
              sampling_variance = "D",
              control = list(adapt_delta = 0.99),
              chains = 4, iter = 4000)
model3 <- hbm(brms::bf(y ~ x1),      data = data_fhnorm,
              re = ~ (1 | regency),
              sampling_variance = "D",
              control = list(adapt_delta = 0.99),
              chains = 4, iter = 4000)

model_compare(model, model2)
model_compare_all(full = model, medium = model2, simple = model3)
```


## 6. Small-area estimates

The simplest call uses the data the model was fit on -- no
`newdata` argument needed:

```{r predictions, eval = RUN_DIAGNOSTICS}
# In-sample prediction (default): operates on the data stored
# inside the fitted object.
est <- sae_predict(fit_demo)
summary(est)
head(est$result_table, 5)
```

To predict at fresh data points, pass `newdata`.  When you do, any
internal offset columns the original fit needed
(e.g.\ `.hbsaems_sigma_fixed = sqrt(D)` for the Fay-Herriot sugar)
are repopulated automatically when the new data frame has the same
number of rows as the training data -- the standard "predict at the
same areas with updated covariates" use case.

```r
# Out-of-sample prediction at new areas:
est_new <- sae_predict(fit_demo, newdata = data_new)
```

### Bayesian model averaging

For model averaging, fit several candidate models, then either let
`model_average()` weight them by stacking weights (LOO-based) or
supply your own weights:

```r
# Stacking weights (default, derived from LOO)
avg_stacked <- model_average(model, model2, model3)

# User-specified weights -- must sum to 1
avg_manual  <- model_average(model, model2, weights = c(0.7, 0.3))

# The returned object is an `hbsae_results`, identical in shape to
# what sae_predict() returns, so all post-processing helpers
# (sae_transform, sae_scale, sae_filter, sae_aggregate) work on it.
plot(avg_stacked, type = "predictions")
```

### Visualisations

```r
# Visualisations (require a graphics device):
plot(est, type = "predictions")
plot(est, type = "uncertainty")
```


## 7. Post-processing predictions

All `sae_*` post-processing helpers operate on the `hbsae_results`
object returned by either `sae_predict()` or `model_average()`:

```r
log_est  <- sae_transform(est, log)
sc_est   <- sae_scale(est)
hi_est   <- sae_filter(est, est$pred > stats::median(est$pred))
agg_est  <- sae_aggregate(sae_predict(model),
                          sae_predict(model2),
                          method = "mean")
```


## Session info

```{r session-info, eval = TRUE}
sessionInfo()
```
