% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vim.R
\name{vim}
\alias{vim}
\title{Variable Importance Measures (VIMs)}
\usage{
vim(
  model,
  scoring_rule = "auc",
  vim_type = "logic",
  adjust = TRUE,
  interaction_order = 3,
  nodesize = NULL,
  alpha = 0.05,
  X_oob = NULL,
  y_oob = NULL,
  Z_oob = NULL,
  leaves = "4pl",
  ...
)
}
\arguments{
\item{model}{The fitted \code{logicDT} or \code{logic.bagged}
model}

\item{scoring_rule}{The scoring rule for assessing the model
performance. As in \code{\link{logicDT}}, "auc", "nce",
"deviance" and "brier" are possible for binary outcomes.
For regression, the mean squared error is used.}

\item{vim_type}{The type of VIM to be calculated. This can
either be \code{"logic"}, \code{"remove"} or
\code{"permutation"}. See below for details.}

\item{adjust}{Shall adjusted interaction VIMs be additionally
(to the VIMs of identified terms) computed? See below for
details.}

\item{interaction_order}{If \code{adjust = TRUE}, up to which
interaction order shall adjusted interaction VIMs be
computed?}

\item{nodesize}{If \code{adjust = TRUE}, how many observations
need to be discriminated by an interaction in order to being
considered? Similar to \code{conjsize} in \code{\link{logicDT}}
and \code{nodesize} in \code{\link{tree.control}}.}

\item{alpha}{If \code{adjust = TRUE}, a further adjustment can be
performed trying to identify the concrete conjunctions responsible
for the interaction of the considered binary predictors.
\code{alpha} specifies the significance level for statistical tests
testing the alternative of a difference in the response for specific
conjunctions. \code{alpha = 0} leads to no further adjustment.
See below for details.}

\item{X_oob}{The predictor data which should be used for
calculating the VIMs.
Preferably some type of validation
data independent of the training data.}

\item{y_oob}{The outcome data for computing the VIMs.
Preferably some type of validation
data independent of the training data.}

\item{Z_oob}{The optional covariable data for computing the
VIMs.
Preferably some type of validation
data independent of the training data.}

\item{leaves}{The prediction mode if 4pL models were fitted
in the leaves. As in \code{\link{predict.logicDT}},
"4pl" and "constant" are the possible settings.}

\item{...}{Parameters passed to the different VIM type functions.
For \code{vim_type = "logic"}, the argument \code{average} can
be specified as \code{"before"} or \code{"after"}. For
\code{vim_type = "permutation"}, \code{n.perm} can be set to
the number of random permutations. See below for details.
For \code{vim_type = "remove"}, \code{empty.model} can be specified
as either \code{"none"} ignoring empty models with all predictive
terms removed or \code{"mean"} using the response mean as prediction
in the case of an empty model.}
}
\value{
A data frame with two columns:
  \item{\code{var}}{Short descriptions of the terms for which the
    importance was measured. For example \code{-X1^X2} for
    \eqn{X_1^c \land X_2}.}
  \item{\code{vim}}{The actual calculated VIM values.}
  The rows of such a data frame are sorted decreasingly by the VIM values.
}
\description{
Calculate variable importance measures (VIMs) based on different
approaches.
}
\details{
Three different VIM methods are implemented:
\itemize{
  \item Permutation VIMs: Random permutations of the respective
    identified logic terms
  \item Removal VIMs: Removing single logic terms
  \item Logic VIMs: Prediction with both possible outcomes
    of a logic term
}
Details on the calculation of these VIMs are given below.

By variable importance, importance of identified logic terms
is meant. These terms can also be single predictors but also
conjunctions in the spirit of this software package.
}
\section{Permutation VIMs}{

Permutation VIMs are computed by comparing the the model's
performance using the original data and data with random
permutations of single terms. This approach was originally
proposed by Breiman & Cutler (2003).
}

\section{Removal VIMs}{

Removal VIMs are constructed removing specific logic
term from the set of predictors, refitting the decision
tree and comparing the performance to the original model.
Thus, this approach requires that at least two terms were
found by the algorithm. Therefore, no VIM will be
calculated if \code{empty.model = "none"} was specified.
Alternatively, \code{empty.model = "mean"} can be set to
use the constant mean response model for approximating
the empty model.
}

\section{Logic VIMs}{

Logic VIMs use the fact that Boolean conjunctions are
Boolean variables themselves and therefore are equal to
0 or 1. To compute the VIM for a specific term,
predictions are performed once for this term fixed to
0 and once for this term fixed to 1. Then, the arithmetic
mean of these two (risk or regression) predictions is
is used for calculating the performance. This performance
is then compared to the original one as in the other
VIM approaches (average = "before"). Alternatively,
predictions for each fixed 0-1 scenario of the considered
term can be performed leading to individual performances
which then are averaged and compared to the original
performance (average = "after").
}

\section{Validation}{

Validation data sets which
were not used in the fitting of the model are prefered
preventing an overfitting of the VIMs themselves.
These should be specified by the \code{_oob} arguments,
if neither bagging nor inner validation was used for fitting
the model.
}

\section{Bagging}{

For the bagging version, out of bag (OOB) data are naturally
used for the calculation of VIMs.
}

\section{VIM Adjustment for Interactions}{

Since decision trees can naturally include interactions
between single predictors (especially when strong marginal
effects are present as well), logicDT models might, e.g.,
include the single input variables \eqn{X_1} and \eqn{X_2} but
not their interaction \eqn{X_1 \land X_2} although an interaction
effect is present. We, therefore, developed and implemented an
adjustment approach for calculating VIMs for such
unidentified interactions nonetheless.
For predictors \eqn{X_{i_1}, \ldots, X_{i_k} =: Z}, this interaction
importance is given by
\deqn{\mathrm{VIM}(X_{i_1} \land \ldots \land X_{i_k}) =
\mathrm{VIM}(X_{i_1}, \ldots, X_{i_k} \mid X \setminus Z) -
\sum_{\lbrace j_1, \ldots, j_l \rbrace {\subset \atop \neq}
\lbrace i_1, \ldots, i_k \rbrace}
\mathrm{VIM}(X_{j_1} \land \ldots \land X_{j_l} \mid X \setminus Z)}
and can basically be applied to all black-box models.
By \eqn{\mathrm{VIM}(A \mid X \setminus Z)}, the VIM of \eqn{A}
considering the predictor set excluding the variables in \eqn{Z}
is meant, i.e., the improvement of additionally considering \eqn{A}
while regarding only the predictors in \eqn{X \setminus Z}.
The proposed interaction VIM can be recursively calculated through
\deqn{\mathrm{VIM}(X_{i_1} \land X_{i_2}) =
\mathrm{VIM}(X_{i_1}, X_{i_2} \mid X \setminus Z) -
\mathrm{VIM}(X_{i_1} \mid X \setminus Z) -
\mathrm{VIM}(X_{i_2} \mid X \setminus Z)}
for \eqn{Z = X_{i_1}, X_{i_2}}.
This leads to the relationship
\deqn{\mathrm{VIM}(X_{i_1} \land \ldots \land X_{i_k}) =
\sum_{\lbrace j_1, \ldots, j_l \rbrace \subseteq \lbrace i_1, \ldots, i_k \rbrace}
(-1)^{k-l} \cdot \mathrm{VIM}(X_{j_1}, \ldots, X_{j_l} \mid X \setminus Z).}
}

\section{Identification of Concrete Conjunctions}{

The aforementioned VIM adjustment approach only captures the importance
of a general definition of interactions, i.e., it just considers
the question whether some variables do interact in any way.
Since logicDT is aimed at identifying specific conjunctions (and also assigns
them VIMs if they were identified by \code{\link{logicDT}}), a further
adjustment approach is implemented which tries to identify the specific
conjunction leading to an interaction effect.
The idea of this method is to consider the response for each possible
scenario of the interacting variables, e.g., for \eqn{X_1 \land (X_2^c \land X_3)}
where the second term \eqn{X_2^c \land X_3} was identified by \code{\link{logicDT}}
and, thus, two interacting terms are regarded,
the \eqn{2^2 = 4} possible scenarios
\eqn{\lbrace (i, j) \mid i, j \in \lbrace 0, 1 \rbrace \rbrace}
are considered. For each setting, the corresponding response is compared with
outcome values of the complementary set. For continuous outcomes, a two sample
t-test (with Welch correction for potentially unequal variances) is performed
comparing the means between these two groups. For binary outcomes, Fisher's exact
test is performed testing different underlying case probabilities.
If at least one test rejects the null hypothesis of equal outcomes (without adjusting
for multiple testing), the combination with the lowest p-value is chosen as the
explanatory term for the interaction effect. For example, if the most significant
deviation results from \eqn{X_1 = 0} and \eqn{(X_2^c \land X_3) = 1} from the example
above, the term \eqn{X_1^c \land (X_2^c \land X_3)} is chosen.
}

\references{
\itemize{
  \item Breiman, L. (2001). Random Forests. Machine Learning 45(1):5-32.
    \doi{https://doi.org/10.1023/A:1010933404324}
  \item Breiman, L. & Cutler, A. (2003). Manual on Setting Up, Using,
    and Understanding Random Forests V4.0. University of California,
    Berkeley, Department of Statistics.
    \url{https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf}
}
}
