---
title: "Diversity Accumulation"
author: "Gilles Colling"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Diversity Accumulation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  dev = "svglite",
  fig.ext = "svg"
)
```

## Introduction

A species accumulation curve answers one question: how does the count of
species grow as more of a region is surveyed. Raw richness, the number of
species present, is the simplest answer, and it is the one the base `spacc()`
function returns. Richness treats a species seen once and a species seen ten
thousand times as equal contributions, so two regions with identical species
lists score the same even when one is dominated by a single hyperabundant
species and the other shares individuals evenly. That difference matters for
conservation triage, for detecting disturbance, and for comparing surveys of
unequal effort, and it is invisible to richness alone.

This vignette covers the diversity-accumulation family in spacc: the functions
that track an abundance-aware diversity measure as sites are added in spatial
order, rather than a raw species count. The family is built on Hill numbers,
which place richness, Shannon diversity, and Simpson diversity on one axis
indexed by a single parameter $q$. The same accumulation machinery extends to
beta diversity (how composition turns over across space), to phylogenetic and
functional diversity (where species carry evolutionary or trait distances), and
to coverage-standardised diversity (where curves are read against sampling
completeness rather than site count). Each variant returns a typed S3 object
with `print()`, `summary()`, `plot()`, and `as.data.frame()` methods, so the
analysis idiom is the same regardless of which diversity measure is tracked.

Uncertainty in spacc comes from one source. Every curve is computed from many
random starting sites; the nearest-neighbour walk that orders the remaining
sites depends on where it begins, so each starting seed produces a different
curve. The package reports the across-seed mean and the 2.5% and 97.5%
quantiles of those seed curves as a ribbon. There are no priors and no
parametric error model. A wide ribbon means the curve is sensitive to where
sampling starts, which is itself ecological information about spatial structure.
The worked examples below all use a small number of seeds to keep the vignette
fast; real analyses should use more, and the practical guidance section gives
concrete numbers.

## The diversity framework

Hill numbers (Hill 1973, Jost 2007, Chao et al. 2014) measure diversity as an
*effective number of species*: the number of equally-abundant species that would
yield the observed value of a diversity index. For a community with relative
abundances $p_i$ for species $i = 1, \dots, S$, where $p_i = n_i / N$, $n_i$ is
the count of species $i$, $N = \sum_i n_i$ is the total count, and $S$ is the
number of species present, the Hill number of order $q$ is

$$
{}^{q}D = \left( \sum_{i=1}^{S} p_i^{q} \right)^{1/(1-q)}.
$$

The order $q$ controls how much weight rare species receive. At $q = 0$ the term
$p_i^0 = 1$ for every present species, so ${}^{0}D = S$, plain richness, which
ignores abundance entirely. The formula is undefined at $q = 1$ because the
exponent $1/(1-q)$ diverges, but the limit exists and equals the exponential of
Shannon entropy,

$$
{}^{1}D = \exp\!\left( -\sum_{i=1}^{S} p_i \log p_i \right),
$$

which weights each species in proportion to its abundance and is read as the
effective number of common species. At $q = 2$ the value is the inverse Simpson
concentration, ${}^{2}D = 1 / \sum_i p_i^2$, the effective number of dominant
species. Hill numbers are non-increasing in $q$: ${}^{q}D \ge {}^{q'}D$ whenever
$q \le q'$, because raising $q$ shifts weight toward the most abundant species
and away from the long tail of rare ones. The gap between ${}^{0}D$ and
${}^{2}D$ is therefore a direct read on dominance: a wide gap means a few
species hold most of the individuals.

The same numbers decompose across space. Following Jost (2007), regional (gamma)
diversity is the multiplicative product of mean local (alpha) diversity and
turnover (beta):

$$
{}^{q}D_{\gamma} = {}^{q}D_{\alpha} \times {}^{q}D_{\beta}.
$$

Here ${}^{q}D_{\gamma}$ is the Hill number of the pooled community across all
sites, ${}^{q}D_{\alpha}$ is the effective number of species in an average site,
and ${}^{q}D_{\beta} = {}^{q}D_{\gamma} / {}^{q}D_{\alpha}$ is the effective
number of completely distinct communities. Beta ranges from 1, when every site
has identical composition, up to the number of sites, when no two sites share a
species. Because alpha is a generalised (power) mean of per-site values, the
decomposition is exact at every $q$, and beta is independent of alpha in the
sense Jost defines, so a region can have high local diversity with low turnover
or the reverse.

## Simulating data

The examples use a single simulated dataset with known structure, so the curves
can be read against ground truth. Each species is placed at a random centre in a
100-by-100 plane, and its expected abundance at a site falls off as a Gaussian
kernel of the squared distance from that centre. A site near a species' centre
has high expected abundance for that species; a site far away has almost none.
Counts are then drawn from a Poisson distribution with that expected value. This
is a thinned Poisson point process viewed at the site level, and it is the
standard way to generate spatially aggregated community data: species are common
where conditions suit them and rare elsewhere, which is what creates spatial
turnover.

```{r simulate}
library(spacc)

set.seed(123)
n_sites <- 60
n_species <- 30

coords <- data.frame(
  x = runif(n_sites, 0, 100),
  y = runif(n_sites, 0, 100)
)

# Abundance matrix with spatial clustering
species <- matrix(0L, n_sites, n_species)
for (sp in seq_len(n_species)) {
  cx <- runif(1, 10, 90)
  cy <- runif(1, 10, 90)
  lambda <- 5 * exp(-0.001 * ((coords$x - cx)^2 + (coords$y - cy)^2))
  species[, sp] <- rpois(n_sites, lambda)
}
colnames(species) <- paste0("sp", seq_len(n_species))
```

Three parameters set the ecology of the simulation. The peak intensity 5 is the
expected count of a species at its own centre, so each species tops out at a
handful of individuals per site. The decay rate 0.001 in the exponent sets the
range over which a species stays common: at a squared distance of 1000 units,
roughly 32 units away, expected abundance has dropped to $5 e^{-1} \approx 1.8$,
and by 60 units it is near zero. Smaller decay rates spread species more widely
and flatten turnover; larger rates concentrate them and sharpen it. The 60 sites
in a 100-by-100 plane give a mean nearest-neighbour spacing of roughly 6 units,
well inside each species' range, so neighbouring sites share many species and
the nearest-neighbour accumulation walk picks up turnover gradually.

The resulting matrix is sites-by-species with integer counts. Most spacc
functions accept either this abundance form or a presence/absence version
obtained by thresholding at zero. The first few entries show the spatial
structure directly: a given site holds large counts for the few species centred
nearby and zeros for the rest.

```{r sim-peek}
dim(species)
species[1:4, 1:8]
sum(species > 0) / length(species)   # fraction of nonzero cells
```

## Hill number accumulation

`spaccHill()` tracks Hill numbers along the spatial accumulation curve. It
orders sites by nearest-neighbour distance from a random seed, pools their
abundances cumulatively, and computes ${}^{q}D$ of the pooled community at each
step, for every requested order $q$. The result is one curve per $q$, each
averaged over `n_seeds` random starts.

```{r hill}
hill <- spaccHill(species, coords, q = c(0, 1, 2), n_seeds = 20, progress = FALSE)
hill
```

```{r plot-hill, fig.cap = "Hill number accumulation for q = 0, 1, 2."}
library(ggplot2)
plot(hill) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

The three curves separate cleanly. The $q = 0$ curve is ordinary richness and
climbs toward the regional pool of 30 species. The $q = 1$ and $q = 2$ curves
sit below it and rise more slowly, because they count effective common and
dominant species rather than all present species. The reason higher $q$
accumulates more slowly is structural: a new site adds a rare species to the
$q = 0$ count immediately, but that species barely moves ${}^{1}D$ or ${}^{2}D$
until it accumulates enough individuals to shift the relative-abundance vector.
The gap between the curves at any site count is the dominance signature at that
spatial scale.

The numeric curve is available through `as.data.frame()`, which returns the
across-seed mean and the 2.5%/97.5% quantile band per site, stacked by $q$.

```{r hill-df}
hill_df <- as.data.frame(hill)
head(hill_df, 3)
# Final-site values per q
hill_df[hill_df$sites == max(hill_df$sites), c("q", "mean", "lower", "upper")]
```

The final-site row gives the regional effective diversity at each order. Here
${}^{0}D$ recovers all 30 species, while ${}^{1}D$ and ${}^{2}D$ are lower,
quantifying how uneven the pooled community is. The `summary()` method returns
the same quantities in long form with an adjustable `ci_level`, which is useful
for plotting custom intervals.

```{r hill-summary}
hs <- summary(hill, ci_level = 0.90)
tail(hs, 3)
```

Reading the three orders together is more informative than any one of them. If
the $q = 0$ curve keeps rising while $q = 2$ has flattened, the region is still
gaining rare species but its dominant structure is already captured; further
sampling adds to the species list without changing what the community is mostly
made of.

## Alpha, beta, and gamma diversity

The accumulation curve describes the pooled community as it grows. The
multiplicative partition describes the fixed set of all sites and splits its
diversity into a local and a turnover part. `alphaDiversity()` returns per-site
Hill numbers, `gammaDiversity()` returns the pooled regional values, and
`diversityPartition()` combines them into the full alpha-beta-gamma
decomposition.

```{r abg}
# Alpha diversity: per-site Hill numbers
alpha <- alphaDiversity(species, q = c(0, 1, 2))
colMeans(alpha)

# Gamma diversity: pooled regional diversity
gamma <- gammaDiversity(species, q = c(0, 1, 2))
gamma

# Full partition: gamma = alpha * beta
partition <- diversityPartition(species, q = c(0, 1, 2))
partition
```

The partition object prints alpha, beta, and gamma side by side for each order.
Mean alpha is much smaller than gamma here because each site holds only the few
species whose centres are nearby, while the region pools all of them. Beta,
their ratio, measures how many effectively distinct communities the region
contains. The `summary()` method adds a normalised beta on a 0-to-1 scale,
$({}^{q}D_{\beta} - 1)/(n - 1)$, which rescales turnover so that 0 is a single
homogeneous community and 1 is complete distinctness across the $n$ sites.

```{r partition-summary}
summary(partition)
```

Beta falls as $q$ rises. At $q = 0$ turnover is driven by the long tail of rare,
locally restricted species, which inflates the effective number of communities.
At $q = 2$ only the dominant species count, and dominants tend to be the
widespread species shared across sites, so the region looks more homogeneous.
This ordering, $\beta_0 \ge \beta_1 \ge \beta_2$, is the typical pattern for
aggregated data and is worth checking: a reversal usually signals that rare
species are widespread while common ones are patchy, which is unusual and worth
investigating.

## Checking and interpreting with uncertainty

Every accumulation curve in spacc carries a confidence band built from the
spread across seed starts. The columns to read are `mean`, `lower`, and `upper`
from `as.data.frame()` or `summary()`. The band is widest early in the curve,
where the identity of the seed site dominates, and narrows as accumulation
covers most of the region and the curves converge on the same pooled community.

```{r ci-band}
ci <- as.data.frame(hill)
ci0 <- ci[ci$q == 0, ]
# Width of the 95% band at a few site counts
data.frame(sites = c(5, 20, 40, 60),
           width = ci0$upper[c(5, 20, 40, 60)] - ci0$lower[c(5, 20, 40, 60)])
```

The band width shrinks toward the right because all seed walks eventually pool
the same sites. A band that stays wide at high site counts would indicate that
the order of accumulation still matters near saturation, which happens when the
region has strong large-scale structure that some seeds traverse and others do
not.

Beyond the curve, two diagnostics summarise the dominance structure of the
communities directly. `evenness()` divides the Hill number at each order by
richness, giving $E_q = {}^{q}D / {}^{0}D$, a value in $[0, 1]$ where 1 is a
perfectly even community and small values mean a few species dominate.

```{r evenness}
ev <- evenness(species, q = seq(0.1, 3, by = 0.1))
ev
```

```{r plot-evenness, fig.cap = "Hill evenness profile across orders."}
plot(ev) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

The evenness profile falls with $q$ because higher orders divide a
dominance-weighted diversity by full richness. A profile that drops steeply
indicates strong dominance; one that stays near 1 indicates a community where
individuals are spread evenly across species. Pielou's $J$ and Simpson evenness
are available through `type = "pielou"` and `type = "simpson"` for single-number
summaries at $q = 1$ and $q = 2$.

```{r evenness-pielou}
evenness(species, type = "pielou")
```

The continuous diversity profile, ${}^{q}D$ as a function of $q$, is the most
complete dominance diagnostic because it shows the whole decline from richness
to inverse Simpson rather than three points. `diversityProfile()` evaluates Hill
numbers over a grid of $q$ for each site and for the pooled region.

```{r profile}
prof <- diversityProfile(species, q = seq(0, 3, by = 0.2))
prof
```

```{r plot-profile, fig.cap = "Diversity profile: per-site mean (solid) and regional gamma (dashed)."}
plot(prof) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

The solid line is the mean per-site profile and the dashed line is the regional
profile. The vertical gap between them at any $q$ is beta diversity at that
order, read off the same plot. Two regions can have identical richness, the
${}^{0}D$ intercept, yet very different profiles: the one whose profile drops
faster is more dominated. Comparing profiles is more honest than comparing
single indices because it does not commit to one value of $q$ before looking at
the data.

## Beta, functional, and phylogenetic beta diversity

Taxonomic beta diversity asks how the species list turns over across space.
`spaccBeta()` partitions it into two pieces along the accumulation curve,
following Baselga (2010): turnover, where new sites bring different species in
exchange for old ones, and nestedness, where new sites are species-poor subsets
of what is already accumulated. The two sum to total beta.

```{r beta}
pa <- (species > 0) * 1L
beta <- spaccBeta(pa, coords, n_seeds = 20, progress = FALSE)
beta
```

```{r plot-beta, fig.cap = "Spatial beta diversity accumulation with turnover/nestedness."}
plot(beta) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

For this aggregated simulation turnover dominates: neighbouring sites hold
different species because each species is confined near its own centre, so
expanding spatially mostly replaces species rather than nesting them. The
numeric partition is in `as.data.frame()`, which gives the across-step means and
their standard deviations.

```{r beta-df}
tail(as.data.frame(beta), 3)
```

The Hill-number framework offers a complementary beta. `spaccHillBeta()`
computes gamma, alpha, and their ratio beta at each accumulation step, for each
$q$, so beta is itself an effective number of communities rather than a
dissimilarity in $[0,1]$. This is the Jost (2007) decomposition tracked along
the curve.

```{r hillbeta}
hb <- spaccHillBeta(species, coords, q = c(0, 1, 2), n_seeds = 15, progress = FALSE)
hb
```

```{r plot-hillbeta, fig.cap = "Hill beta (effective number of communities) along accumulation."}
plot(hb, component = "beta") +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

Beta rises as sites accumulate because each new site can only add distinctness,
and it rises fastest at $q = 0$ where rare restricted species count most. The
Baselga partition and the Hill partition answer different questions: Baselga
asks whether dissimilarity is replacement or loss, while Hill beta asks how many
distinct communities the accumulated set is worth. Reporting both is reasonable
when the distinction matters.

When species carry traits, functional beta weights turnover by how different the
incoming and outgoing species are in trait space. `spaccBetaFunc()` scales the
Baselga components by the mean pairwise trait distance of the species that
differ, so replacing a species with a near-identical one registers little beta
even though the species list changed.

```{r beta-func}
# Simulate two continuous traits
traits <- data.frame(
  body_size = rnorm(n_species),
  wing_length = rnorm(n_species)
)
rownames(traits) <- colnames(species)

beta_func <- spaccBetaFunc(pa, coords, traits, n_seeds = 20, progress = FALSE)
```

```{r plot-beta-func, fig.cap = "Functional beta diversity accumulation."}
plot(beta_func) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

Functional beta is lower than taxonomic beta whenever the species that turn over
are functionally similar, which is the common case in trait-redundant
assemblages. The phylogenetic analogue, `spaccBetaPhylo()`, does the same with a
phylogenetic distance matrix from a tree, weighting turnover by evolutionary
divergence so that swapping close relatives counts for less than swapping
distant lineages.

```{r beta-phylo, eval = requireNamespace("ape", quietly = TRUE)}
library(ape)
tree <- rcoal(n_species, tip.label = colnames(species))

beta_phylo <- spaccBetaPhylo(pa, coords, tree, n_seeds = 20, progress = FALSE)
```

```{r plot-beta-phylo, fig.cap = "Phylogenetic beta diversity accumulation.", eval = requireNamespace("ape", quietly = TRUE)}
plot(beta_phylo) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

Comparing taxonomic, functional, and phylogenetic beta on the same data
separates three kinds of turnover. High taxonomic but low functional and
phylogenetic beta means the region swaps species that are ecologically and
evolutionarily interchangeable. High beta on all three means the region replaces
distinct lineages with distinct ecological roles, which is the pattern that
flags genuine biogeographic boundaries.

## Phylogenetic and functional accumulation

Beyond beta, the diversity of the accumulated community itself can be measured
on a tree or in trait space. `spaccPhylo()` tracks phylogenetic diversity along
the curve. Mean pairwise distance (MPD) is the average phylogenetic distance
between all species present, sensitive to deep tree structure; mean nearest
taxon distance (MNTD) is the average distance to each species' closest relative,
sensitive to terminal clustering; and Faith's PD is the total branch length
spanned by the accumulated species.

```{r phylo, eval = requireNamespace("ape", quietly = TRUE)}
phylo_acc <- spaccPhylo(pa, coords, tree,
                        metric = c("mpd", "mntd", "pd"),
                        n_seeds = 20, progress = FALSE)
```

```{r plot-phylo, fig.cap = "Phylogenetic diversity accumulation.", eval = requireNamespace("ape", quietly = TRUE)}
plot(phylo_acc) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

PD rises steadily as more lineages are added, much like richness, because each
new species contributes branch length. MPD and MNTD behave differently: they can
plateau early or even decline if the species added late are close relatives of
ones already present, since they measure average distances rather than totals. A
flat MPD with a rising PD says the region keeps adding species but not new deep
branches.

Functional accumulation works the same way in trait space. `spaccFunc()` tracks
functional dispersion (FDis), the abundance-weighted mean distance of species to
the community centroid, and functional richness (FRic), the volume of trait
space occupied. FRic needs more species than traits to define a hull, so with a
two-trait dataset it stabilises once enough species accumulate.

```{r func}
func_acc <- spaccFunc(species, coords, traits,
                      metric = c("fdis"),
                      n_seeds = 20, progress = FALSE)
```

```{r plot-func, fig.cap = "Functional diversity accumulation."}
plot(func_acc) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

FDis tends to saturate quickly because once the community spans the main axes of
trait variation, adding species near the centroid does not move the mean
distance much. A FDis curve that keeps climbing indicates the region is still
gaining functionally extreme species at large spatial scales.

The continuous-$q$ profile extends to both paradigms. `diversityProfilePhylo()`
gives phylogenetic Hill numbers across $q$, and `diversityProfileFunc()` gives
trait-similarity Hill numbers, each reading as an effective number of lineages
or functional units that declines with $q$ exactly as the taxonomic profile
does.

```{r profile-func}
fp <- diversityProfileFunc(species, traits, q = seq(0, 3, by = 0.5))
fp
```

## Rao's quadratic entropy

Rao's quadratic entropy combines relative abundance with pairwise distance
between species, giving an abundance-weighted measure of phylogenetic or
functional dispersion (Rao 1982). It is defined as
$Q = \sum_{i} \sum_{j} p_i p_j d_{ij}$, where $p_i$ is the relative abundance of
species $i$ and $d_{ij}$ is the distance between species $i$ and $j$ on the tree
or in trait space. Both `spaccPhylo()` and `spaccFunc()` expose it through
`metric = "rao"`.

```{r rao-phylo, eval = requireNamespace("ape", quietly = TRUE)}
phylo_rao <- spaccPhylo(species, coords, tree,
                        metric = "rao",
                        n_seeds = 20, progress = FALSE)
```

```{r plot-rao-phylo, fig.cap = "Phylogenetic Rao's Q accumulation.", eval = requireNamespace("ape", quietly = TRUE)}
plot(phylo_rao) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

For functional Rao the species distance is the Euclidean distance in trait
space:

```{r rao-func}
func_rao <- spaccFunc(species, coords, traits,
                      metric = "rao",
                      n_seeds = 20, progress = FALSE)
tail(as.data.frame(func_rao), 3)
```

With presence/absence data Rao's Q reduces to an equal-weight form; with
abundances it weights each species pair by the product of relative abundances.
The phylogenetic and functional backends carry abundances in double precision,
so continuous values such as percent cover or biomass are used directly:

```{r rao-cover}
cover <- species / max(species)   # fractional values in [0, 1]
func_rao_cover <- spaccFunc(cover, coords, traits,
                            metric = "rao", n_seeds = 10, progress = FALSE)
tail(as.data.frame(func_rao_cover), 3)
```

Rao's Q saturates faster than richness because once the community spans the main
distance structure, adding more species near existing ones barely changes the
abundance-weighted mean distance. It is the natural single-number summary when
both abundance and pairwise distance matter, and it connects to Hill numbers:
the effective number form of Rao is a functional or phylogenetic Hill number at
$q = 2$.

## Coverage-based diversity

Comparing curves at equal site count is not always fair, because a fixed number
of sites can represent very different fractions of the underlying community
depending on how abundant the organisms are. Sample coverage is the estimated
proportion of total community abundance belonging to the species already seen,
and standardising by coverage rather than site count puts surveys of unequal
effort on the same footing (Chao & Jost 2012). `spaccCoverage()` tracks coverage
alongside richness as sites accumulate.

```{r coverage}
cov <- spaccCoverage(species, coords, n_seeds = 20, progress = FALSE)
cov
```

```{r plot-coverage, fig.cap = "Coverage-based spatial rarefaction."}
plot(cov) +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

The default Chao-Jost estimator uses singleton and doubleton counts and assumes
individuals are sampled independently. For plot-based data where sites are the
sampling units, the Chiu (2023) sample-based estimator is more appropriate
because it uses incidence frequencies across sites instead.

```{r coverage-chiu}
cov_chiu <- spaccCoverage(species, coords, coverage = "chiu",
                          n_seeds = 20, progress = FALSE)
tail(as.data.frame(cov_chiu), 3)
```

Once coverage is tracked, richness can be read at fixed completeness levels.
`interpolateCoverage()` returns the richness each seed curve reached at the
requested coverage targets, interpolating within the observed range.

```{r interpolate}
interp <- interpolateCoverage(cov, target = c(0.90, 0.95))
summary(interp)
```

The same idea applied to Hill numbers gives coverage-standardised diversity at
every order. `spaccHillCoverage()` computes Hill numbers and coverage together,
and plotting against coverage rather than site count shows diversity at matched
completeness.

```{r hillcov}
hc <- spaccHillCoverage(species, coords, q = c(0, 1, 2),
                        n_seeds = 15, progress = FALSE)
```

```{r plot-hillcov, fig.cap = "Hill numbers plotted against sample coverage."}
plot(hc, xaxis = "coverage") +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

Reading diversity against coverage is the recommended way to compare two regions
of unequal sampling effort: pick a coverage level both reach, such as 95%, and
compare the diversity there. Comparing at equal site count instead would
penalise the region whose organisms are rarer, because the same site count buys
it less coverage.

## Custom diversity metrics

`spaccDiversity()` accumulates any user-supplied index along a spatial ordering.
At each step the cumulative community is passed to a function that returns a
single number, so indices spacc does not implement directly can still be tracked
along the accumulation curve.

```{r custom-metric}
# Shannon entropy of the cumulative community
shannon <- function(comm) {
  p <- comm[comm > 0] / sum(comm)
  -sum(p * log(p))
}

div <- spaccDiversity(species, coords, shannon,
                      method = "knn", n_seeds = 20, progress = FALSE)
```

```{r plot-custom, fig.cap = "Custom (Shannon entropy) accumulation curve."}
plot(div, ylab = "Cumulative Shannon entropy") +
  theme(panel.background = element_rect(fill = "transparent"),
        plot.background = element_rect(fill = "transparent"))
```

The function receives a named vector of cumulative abundances (or 0/1 incidences
when `incidence = TRUE`), plus any extra arguments passed through `...`. The
result inherits the `spacc` class, so `summary()`, `as.data.frame()`, and
`predict()` apply. The `plot()` method uses a metric-neutral axis label by
default; set `ylab` to name the index. Available orderings are `"knn"`,
`"kncn"`, `"random"`, `"radius"`, and `"collector"`. Because the index is an
arbitrary R function it runs slower than the compiled metrics while accepting any
definition, so it is best kept for indices that are not already built in.

## Practical guidance

Diversity accumulation offers more knobs than plain richness, and a few
conventions keep results comparable and defensible.

**Which $q$ to report.** Report all three of $q = 0$, $q = 1$, and $q = 2$
together rather than choosing one. The triple ${}^{0}D, {}^{1}D, {}^{2}D$ shows
richness, common-species diversity, and dominant-species diversity in one row,
and the gap between them is the dominance signal. If a single number is forced,
$q = 1$ is the least arbitrary because it weights species exactly by abundance
and is the only order where the diversity decomposition is fully symmetric. Use
$q = 0$ when rare species are the conservation target and $q = 2$ when dominance
or invasion is the question.

**Abundance versus incidence.** Use abundance data whenever counts are
available, because every order above $q = 0$ needs them; presence/absence
collapses all orders to richness and discards the dominance information the Hill
framework exists to capture. Beta partitioning with `spaccBeta()` runs on
presence/absence by design, but Hill, functional, and phylogenetic diversity
should be fed counts, cover, or biomass. The functional and phylogenetic
backends accept non-integer abundances, so percent cover in $[0,1]$ works
directly.

**Trait and tree minimums.** Functional richness (FRic) needs at least one more
species than traits to define a convex hull, so with 4 traits keep at least 5
species per accumulation step and prefer 10 or more for a stable hull. Rao's Q
and FDis work with fewer but become noisy below about 5 species. For trait
distances use at least 2 traits; a single trait reduces functional diversity to
a one-dimensional range and the convex hull degenerates. Phylogenetic metrics
need a tree covering every species in the matrix, and MPD and MNTD stabilise
with roughly 15 or more species; below 10 they swing widely from one or two
pairs.

**Number of seeds.** The ribbon width is set by the spread across seeds, so more
seeds give a smoother, narrower band. Use at least 50 seeds for reporting and
100 or more when the confidence band itself is the result. The examples here use
10 to 20 to keep the build fast, which is too few for publication. Set `seed =`
for reproducibility, since the seed sites are drawn at random.

**Minimum sample size.** Coverage-based comparison needs each region to reach the
common coverage target; below about 20 sites with aggregated data, coverage
often does not climb past 90%, and interpolation to a higher target returns `NA`.
For the multiplicative partition, beta is unstable with fewer than about 5 sites
because alpha is a mean over very few values. Aim for at least 20 sites for any
diversity-accumulation analysis and 30 or more before reading turnover.

**When not to use these.** Do not compute Rao's Q or any abundance-weighted
metric with very few species, roughly under 5, because the abundance-weighted
mean distance is then driven by one or two pairs and the curve is meaningless.
Do not use functional metrics with a single trait, where the hull collapses and
FRic is undefined. Do not standardise by coverage when one region cannot reach
the target coverage at all; compare at the highest coverage both attain, or fall
back to site-based curves and say so. Finally, treat a wide confidence ribbon as
a possible artefact of too few seeds before reading it as biological signal.

The table below summarises which function fits which question.

| Question | Function | Input | Key metric |
|---|---|---|---|
| How does effective diversity grow with area | `spaccHill()` | abundance | ${}^{q}D$ per order |
| How is regional diversity split locally vs across sites | `diversityPartition()` | abundance | alpha, beta, gamma |
| Is turnover replacement or species loss | `spaccBeta()` | presence/absence | turnover, nestedness |
| How many distinct communities (Hill beta) | `spaccHillBeta()` | abundance | beta per order |
| Does turnover involve distinct traits or lineages | `spaccBetaFunc()`, `spaccBetaPhylo()` | P/A + traits/tree | weighted beta |
| How does evolutionary or trait diversity accumulate | `spaccPhylo()`, `spaccFunc()` | abundance + tree/traits | MPD, MNTD, PD, FDis, FRic |
| Abundance-weighted trait/lineage dispersion | `spaccPhylo(metric="rao")`, `spaccFunc(metric="rao")` | abundance + tree/traits | Rao's Q |
| Fair comparison under unequal effort | `spaccCoverage()`, `spaccHillCoverage()` | abundance | coverage-standardised |
| An index spacc does not implement | `spaccDiversity()` | any | user function |

The whole family shares one analysis idiom. Simulate or load a sites-by-species
matrix, call the function with coordinates, read the curve with
`as.data.frame()` or `summary()`, and plot it with the across-seed ribbon. The
choice of measure, taxonomic, functional, phylogenetic, or coverage-standardised,
changes what the y-axis means, not how the result is obtained.

## References

- Baselga, A. (2010). Partitioning the turnover and nestedness components
  of beta diversity. Global Ecology and Biogeography, 19, 134-143.
- Baselga, A. (2012). The relationship between species replacement,
  dissimilarity derived from nestedness, and nestedness. Global Ecology
  and Biogeography, 21, 1223-1232.
- Chao, A. & Jost, L. (2012). Coverage-based rarefaction and
  extrapolation. Ecology, 93, 2533-2547.
- Chao, A., Gotelli, N.J., Hsieh, T.C., et al. (2014). Rarefaction and
  extrapolation with Hill numbers. Ecological Monographs, 84, 45-67.
- Chiu, C.-H. (2023). A sample-based estimator for sample coverage.
  Ecology, 104, e4099.
- Faith, D.P. (1992). Conservation evaluation and phylogenetic diversity.
  Biological Conservation, 61, 1-10.
- Hill, M.O. (1973). Diversity and evenness: a unifying notation and its
  consequences. Ecology, 54, 427-432.
- Jost, L. (2007). Partitioning diversity into independent alpha and beta
  components. Ecology, 88, 2427-2439.
- Leinster, T. & Cobbold, C.A. (2012). Measuring diversity: the importance
  of species similarity. Ecology, 93, 477-489.
- Rao, C.R. (1982). Diversity and dissimilarity coefficients: a unified
  approach. Theoretical Population Biology, 21, 24-43.
