% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bandwidth_selection_cv_sf.R
\name{bw_cv_likelihood_calc}
\alias{bw_cv_likelihood_calc}
\title{Bandwidth selection by likelihood cross validation}
\usage{
bw_cv_likelihood_calc(
  bws = NULL,
  lines,
  events,
  w,
  kernel_name,
  method,
  diggle_correction = FALSE,
  study_area = NULL,
  adaptive = FALSE,
  trim_bws = NULL,
  mat_bws = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  sub_sample = 1,
  zero_strat = "min_double",
  verbose = TRUE,
  check = TRUE
)
}
\arguments{
\item{bws}{An ordered numeric vector with the bandwidths}

\item{lines}{A feature collection of linestrings representing the underlying network. The
geometries must be simple Linestrings (may crash if some geometries
are invalid) without MultiLineSring.}

\item{events}{events A feature collection of points representing the events on the
network. The points will be snapped on the network to their closest line.}

\item{w}{A vector representing the weight of each event}

\item{kernel_name}{The name of the kernel to use. Must be one of triangle,
gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.}

\item{method}{The method to use when calculating the NKDE, must be one of
simple / discontinuous / continuous (see nkde details for more information)}

\item{diggle_correction}{A Boolean indicating if the correction factor
for edge effect must be used.}

\item{study_area}{A feature collection of polygons
representing the limits of the study area.}

\item{adaptive}{A boolean indicating if an adaptive bandwidth must be used.
If adaptive = TRUE, the local bandwidth are derived from the global bandwidths (bws)}

\item{trim_bws}{A vector indicating the maximum value an adaptive bandwidth can
reach. Higher values will be trimmed. It must have the same length as bws.}

\item{mat_bws}{A matrix giving the bandwidths for each observation and for each global bandwidth.
This is usefull when the user want to use a different method from Abramson's smoothing regimen.}

\item{max_depth}{when using the continuous and discontinuous methods, the
calculation time and memory use can go wild  if the network has many
small edges (area with many of intersections and many events). To
avoid it, it is possible to set here a maximum depth. Considering that the
kernel is divided at intersections, a value of 10 should yield good
estimates in most cases. A larger value can be used without a problem for the
discontinuous method. For the continuous method, a larger value will
strongly impact calculation speed.}

\item{digits}{The number of digits to retain from the spatial coordinates. It
ensures that topology is good when building the network. Default is 3. Too high a
precision (high number of digits) might break some connections}

\item{tol}{A float indicating the minimum distance between the events and the
lines' extremities when adding the point to the network. When points are
closer, they are added at the extremity of the lines.}

\item{agg}{A double indicating if the events must be aggregated within a
distance. If NULL, the events are aggregated only by rounding the
coordinates.}

\item{sparse}{A Boolean indicating if sparse or regular matrices should be
used by the Rcpp functions. These matrices are used to store edge indices
between two nodes in a graph. Regular matrices are faster, but require more
memory, in particular with multiprocessing. Sparse matrices are slower (a
bit), but require much less memory.}

\item{grid_shape}{A vector of two values indicating how the study area
must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could
reduce memory usage and increase speed when a large dataset is used. When using
multiprocessing, the work in each grid is dispatched between the workers.}

\item{sub_sample}{A float between 0 and 1 indicating the percentage of quadra
to keep in the calculus. For large datasets, it may be useful to limit the
bandwidth evaluation and thus reduce calculation time.}

\item{zero_strat}{A string indicating what to do when density is 0 when calculating LOO density estimate for an isolated event.
"min_double" (default) replace the 0 value by the minimum double possible on the machine. "remove" will remove them from the final
score. The first approach penalizes more strongly the small bandwidths.}

\item{verbose}{A Boolean, indicating if the function should print messages
about the process.}

\item{check}{A Boolean indicating if the geometry checks must be run before
the operation. This might take some times, but it will ensure that the CRS
of the provided objects are valid and identical, and that geometries are valid.}
}
\value{
A dataframe with two columns, one for the bandwidths and the second for
the cross validation score (the lower the better).
}
\description{
Calculate for multiple bandwidth the cross validation likelihood to
select an appropriate bandwidth in a data-driven approach
}
\details{
The function calculates the likelihood cross validation score for several
bandwidths in order to find the most appropriate one. The general idea is to find the
bandwidth that would produce the most similar results if one event was removed from
the dataset (leave one out cross validation). We use here the shortcut formula as
described by the package spatstat \insertCite{spatstatpkg}{spNetwork}.

\eqn{LCV(h) = \sum_i \log\hat\lambda_{-i}(x_i)}

Where the sum is taken for all events \eqn{x_i} and where \eqn{\hat\lambda_{-i}(x_i)} is the leave-one-out kernel
estimate at \eqn{x_i} for a bandwidth h. A higher value indicates a better bandwidth.
}
\examples{
\donttest{
data(mtl_network)
data(bike_accidents)
cv_scores <- bw_cv_likelihood_calc(seq(200,800,50),
                               mtl_network, bike_accidents,
                               rep(1,nrow(bike_accidents)),
                               "quartic", "simple",
                               diggle_correction = FALSE, study_area = NULL,
                               max_depth = 8,
                               digits=2, tol=0.1, agg=5,
                               sparse=TRUE, grid_shape=c(1,1),
                               sub_sample = 1, verbose=TRUE, check=TRUE)
}
}
\references{
{
\insertAllCited{}
}
}
