% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{eval_util_L}
\alias{eval_util_L}
\title{Expected utility for local species diversity assessments.}
\usage{
eval_util_L(
  settings,
  fit = NULL,
  z = NULL,
  theta = NULL,
  phi = NULL,
  N_rep = 1,
  cores = 1L
)
}
\arguments{
\item{settings}{A data frame that specifies a set of conditions under which
utility is evaluated. It must include columns named \code{K} and \code{N}, which
specify the number of replicates per site and the sequencing depth per
replicate, respectively.
\code{K} and \code{N} must be numeric vectors greater than 0. When \code{K} contains a
decimal value, it is discarded and treated as an integer.
Additional columns are ignored, but may be included.}

\item{fit}{An \code{occumbFit} object.}

\item{z}{Sample values of site occupancy status of species stored in an array
with sample \eqn{\times}{*} species \eqn{\times}{*} site dimensions.}

\item{theta}{Sample values of sequence capture probabilities of species
stored in a matrix with sample \eqn{\times}{*} species dimensions or an array
with sample \eqn{\times}{*} species \eqn{\times}{*} site dimensions.}

\item{phi}{Sample values of sequence relative dominance of species stored in
a matrix with sample \eqn{\times}{*} species dimensions or an array with
sample \eqn{\times}{*} species \eqn{\times}{*} site dimensions.}

\item{N_rep}{Controls the sample size for the Monte Carlo integration.
The integral is evaluated using \code{N_sample * N_rep} random samples,
where \code{N_sample} is the maximum size of the MCMC sample in the \code{fit}
argument and the parameter sample in the \code{z}, \code{theta}, and \code{phi} arguments.}

\item{cores}{The number of cores to use for parallelization.}
}
\value{
A data frame with a column named \code{Utility} in which the estimates of the
expected utility are stored. This is obtained by adding the \code{Utility} column
to the data frame provided in the \code{settings} argument.
}
\description{
\code{eval_util_L()} evaluates the expected utility of a local
species diversity assessment by using Monte Carlo integration.
}
\details{
The utility of local species diversity assessment for a given set of sites
can be defined as the expected number of detected species per site
(Fukaya et al. 2022). \code{eval_util_L()} evaluates this utility for arbitrary
sets of sites that can potentially have different values for site occupancy
status of species, \eqn{z}{z}, sequence capture probabilities of species,
\eqn{\theta}{theta}, and sequence relative dominance of species,
\eqn{\phi}{phi}, for the combination of \code{K} and \code{N} values specified in the
\code{conditions} argument.
Such evaluations can be used to balance \code{K} and \code{N} to maximize the utility
under a constant budget (possible combinations of \code{K} and \code{N} under a
specified budget and cost values are easily obtained using \code{list_cond_L()};
see the example below).
It is also possible to examine how the utility varies with different \code{K}
and \code{N} values without setting a budget level, which may be useful for determining
a satisfactory level of \code{K} and \code{N} from a purely technical point of view.

The expected utility is defined as the expected value of the conditional
utility in the form:
\deqn{U(K, N \mid \boldsymbol{r}, \boldsymbol{u}) = \frac{1}{J}\sum_{j = 1}^{J}\sum_{i = 1}^{I}\left\{1 - \prod_{k = 1}^{K}\left(1 - \frac{u_{ijk}r_{ijk}}{\sum_{m = 1}^{I}u_{mjk}r_{mjk}} \right)^N \right\}}{U(K, N | r, u) = (1 / J) * sum_{j, i}((1 - \prod_{k}(1 - (u[i, j, k] * r[i, j, k])/sum(u[, j, k] * r[, j, k])))^N)}
where \eqn{u_{ijk}}{u[i, j, k]} is a latent indicator variable representing
the inclusion of the sequence of species \eqn{i}{i} in replicate \eqn{k}{k}
at site \eqn{j}{j}, and \eqn{r_{ijk}}{r[i, j, k]} is a latent variable that
is proportional to the relative frequency of the sequence of species
\eqn{i}{i}, conditional on its presence in replicate \eqn{k}{k} at site
\eqn{j}{j} (Fukaya et al. 2022).
Expectations are taken with respect to the posterior (or possibly prior)
predictive distributions of \eqn{\boldsymbol{r} = \{r_{ijk}\}}{r} and
\eqn{\boldsymbol{u} = \{u_{ijk}\}}{u}, which are evaluated numerically using
Monte Carlo integration. The predictive distributions of
\eqn{\boldsymbol{r}}{r} and \eqn{\boldsymbol{u}}{u} depend on the model
parameters \eqn{z}{z}, \eqn{\theta}{theta}, and \eqn{\phi}{phi} values.
Their posterior (or prior) distribution is specified by supplying an
\code{occumbFit} object containing their posterior samples via the \code{fit} argument,
or by supplying a matrix or array of posterior (or prior) samples of
parameter values via the \code{z}, \code{theta}, and \code{phi} arguments. Higher
approximation accuracy can be obtained by increasing the value of \code{N_rep}.

The \code{eval_util_L()} function can be executed by supplying the \code{fit} argument
without specifying the \code{z}, \code{theta}, and \code{phi} arguments, by supplying the
three \code{z}, \code{theta}, and \code{phi} arguments without the \code{fit} argument, or by
supplying the \code{fit} argument and any or all of the \code{z}, \code{theta}, and \code{phi}
arguments. If \code{z}, \code{theta}, or \code{phi} arguments are specified in addition
to the \code{fit}, the parameter values given in these arguments are used
preferentially to evaluate the expected utility. If the sample sizes differ among
parameters, parameters with smaller sample sizes are resampled with
replacements to align the sample sizes across parameters.

The expected utility is evaluated assuming homogeneity of replicates, in the
sense that \eqn{\theta}{theta} and \eqn{\phi}{phi}, the model parameters
associated with the species detection process, are constant across
replicates within a site. For this reason, \code{eval_util_L()} does not accept
replicate-specific \eqn{\theta}{theta} and \eqn{\phi}{phi}. If the
\code{occumbFit} object supplied in the \code{fit} argument has a replicate-specific
parameter, the parameter samples to be used in the utility evaluation must be
provided explicitly via the \code{theta} or \code{phi} arguments.

The Monte Carlo integration is executed in parallel on multiple CPU cores, where
the \code{cores} argument controls the degree of parallelization.
}
\section{References}{

K. Fukaya, N. I. Kondo, S. S. Matsuzaki and T. Kadoya (2022)
Multispecies site occupancy modelling and study design for spatially
replicated environmental DNA metabarcoding. \emph{Methods in Ecology
and Evolution} \strong{13}:183--193.
\doi{10.1111/2041-210X.13732}
}

\examples{
\donttest{
set.seed(1)

# Generate a random dataset (20 species * 2 sites * 2 reps)
I <- 20 # Number of species
J <- 2  # Number of sites
K <- 2  # Number of replicates
data <- occumbData(
    y = array(sample.int(I * J * K), dim = c(I, J, K)))

# Fitting a null model
fit <- occumb(data = data)

## Estimate expected utility
# Arbitrary K and N values
(util1 <- eval_util_L(expand.grid(K = 1:3, N = c(1E3, 1E4, 1E5)),
                      fit))

# K and N values under specified budget and cost
(util2 <- eval_util_L(list_cond_L(budget = 1E5,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  fit),
                      fit))

# K values restricted
(util3 <- eval_util_L(list_cond_L(budget = 1E5,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  fit,
                                  K = 1:5),
                      fit))

# theta and phi values supplied
(util4 <- eval_util_L(list_cond_L(budget = 1E5,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  fit,
                                  K = 1:5),
                      fit,
                      theta = array(0.5, dim = c(4000, I, J)),
                      phi = array(1, dim = c(4000, I, J))))

# z, theta, and phi values, but no fit object supplied
(util5 <- eval_util_L(list_cond_L(budget = 1E5,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  fit,
                                  K = 1:5),
                      fit = NULL,
                      z = array(1, dim = c(4000, I, J)),
                      theta = array(0.5, dim = c(4000, I, J)),
                      phi = array(1, dim = c(4000, I, J))))
}
}
