% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/apollo_outOfSample.R
\name{apollo_outOfSample}
\alias{apollo_outOfSample}
\title{Cross-validation of fit (LL)}
\usage{
apollo_outOfSample(
  apollo_beta,
  apollo_fixed,
  apollo_probabilities,
  apollo_inputs,
  estimate_settings = list(estimationRoutine = "bfgs", maxIterations = 200, writeIter =
    FALSE, hessianRoutine = "none", printLevel = 3L, silent = TRUE),
  outOfSample_settings = list(nRep = 10, validationSize = 0.1, samples = NA, rmseObs =
    NULL)
)
}
\arguments{
\item{apollo_beta}{Named numeric vector. Names and values for parameters.}

\item{apollo_fixed}{Character vector. Names (as defined in 
\code{apollo_beta}) of parameters whose value should not 
change during estimation.}

\item{apollo_probabilities}{Function. Returns probabilities of the model to 
   be estimated. Must receive three arguments:
\itemize{
  \item \strong{\code{apollo_beta}}: Named numeric 
        vector. Names and values of model parameters.
  \item \strong{\code{apollo_inputs}}: List 
        containing options of the model. See 
        \link{apollo_validateInputs}.
  \item \strong{\code{functionality}}: Character. 
        Can be either \strong{\code{"components"}}, 
        \code{"conditionals"}, \code{"estimate"} 
        (default), \code{"gradient"}, 
        \code{"output"}, \code{"prediction"}, 
        \code{"preprocess"}, \code{"raw"}, 
        \code{"report"}, \code{"shares_LL"}, 
        \code{"validate"} or \code{"zero_LL"}.
}}

\item{apollo_inputs}{List grouping most common inputs. Created by function 
\link{apollo_validateInputs}.}

\item{estimate_settings}{List. Options controlling the estimation process. 
See \link{apollo_estimate}.}

\item{outOfSample_settings}{List. Contains settings for this function. User 
input is required for all settings except those 
with a default or marked as optional. 
      \describe{
        \item{nRep}{Numeric scalar. Number of 
                    times a different pair of 
                    estimation and validation 
                    sets are to be extracted 
                    from the full database.
                    Default is 30.}
        \item{samples}{Numeric matrix or 
                       data.frame. Optional 
                       argument. Must have as 
                       many rows as observations 
                       in the \code{database}, 
                       and as many columns as 
                       number of  repetitions 
                       wanted. Each column 
                       represents a re-sample, 
                       and each element must be 
                       a 0 if the observation 
                       should be assigned to the 
                       estimation sample, or 1 
                       if the observation should 
                       be assigned to the 
                       prediction sample. If this 
                       argument is provided, then 
                       \code{nRep} and 
                       \code{validationSize} are 
                       ignored. Note that this 
                       allows sampling at the 
                       observation rather than 
                       the individual level.}
        \item{validationSize}{Numeric scalar. 
                              Size of the 
                              validation sample. 
                              Can be a percentage 
                              of the sample (0-1) 
                              or the number of 
                              individuals in the 
                              validation sample 
                              (>1). Default is 
                              0.1.}
        \item{rmseObs}{Character vector. Used to 
                       calculate Root Mean 
                       Squared Error (RMSE) of 
                       prediction. It should 
                       contain the name of the 
                       columns with the observed 
                       outcomes in the database, 
                       in the same order than as 
                       returned by 
                       apollo_probabilities when 
                       used with
                       functionality="prediction". 
                       If omitted or NULL, no 
                       RMSE is calculated.
                       It works only for models 
                       with a single component.}
      }}
}
\value{
A matrix with the average log-likelihood per observation for both the estimation and validation 
        samples, for each repetition. Two additional files with further details are written to the
        working/output directory.
}
\description{
Randomly generates estimation and validation samples, estimates the model on 
the first and calculates the likelihood for the second, then repeats.
}
\details{
A common way to test for overfitting of a model is to measure its fit on a 
sample not used during estimation that is, measuring its out-of-sample fit. 
A simple way to do this is splitting the complete available dataset in two 
parts: an estimation sample, and a validation sample. 
The model of interest is estimated using only the estimation sample, and 
then those estimated parameters are used to measure the fit of the model 
(e.g. the log-likelihood of the model) on the validation sample. Doing this 
with only one validation sample, however, may lead to biased results, as a 
particular validation sample need not be representative of the population. 
One way to minimise this issue is to randomly draw several pairs of 
estimation and validation samples from the complete dataset, and apply the 
procedure to each pair.

The splitting of the database into estimation and validation samples is done 
at the individual level, not at the observation level. If the sampling wants 
to be done at the individual level (not recommended on panel data), then the 
optional \code{outOfSample_settings$samples} argument should be provided.

This function writes two different files to the working/output directory:
\itemize{
  \item \strong{\code{modelName_outOfSample_params.csv}}: Records the 
        estimated parameters, final log-likelihood, and number of 
        observations on each repetition.
  \item \strong{\code{modelName_outOfSample_samples.csv}}: Records the 
        sample composition of each repetition.
}
The first two files are updated throughout the run of this function, while 
the last one is only written once the function finishes.

When run, this function will look for the two files above in the 
working/output directory. If they are found, the function will attempt to 
pick up re-sampling from where those files left off. This is useful in cases 
where the original bootstrapping was interrupted, or when additional 
re-sampling wants to be performed.
}
