% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Filter.R
\docType{data}
\name{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{rf.importance}
\alias{makeFilter}
\alias{rf.min.depth}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{univariate}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\alias{makeFilter}
\title{Create a feature filter.}
\format{An object of class \code{Filter} of length 6.}
\usage{
makeFilter(name, desc, pkg, supported.tasks, supported.features, fun)

rf.importance

rf.min.depth

univariate
}
\arguments{
\item{name}{(\code{character(1)})\cr
Identifier for the filter.}

\item{desc}{(\code{character(1)})\cr
Short description of the filter.}

\item{pkg}{(\code{character(1)})\cr
Source package where the filter is implemented.}

\item{supported.tasks}{(\link{character})\cr
Task types supported.}

\item{supported.features}{(\link{character})\cr
Feature types supported.}

\item{fun}{(\code{function(task, nselect, ...})\cr
Function which takes a task and returns a named numeric vector of scores,
one score for each feature of \code{task}.
Higher scores mean higher importance of the feature.
At least \code{nselect} features must be calculated, the remaining may be
set to \code{NA} or omitted, and thus will not be selected.
the original order will be restored if necessary.}
}
\value{
Object of class \dQuote{Filter}.
}
\description{
Creates and registers custom feature filters. Implemented filters
can be listed with \link{listFilterMethods}. Additional
documentation for the \code{fun} parameter specific to each filter can
be found in the description.

Minimum redundancy, maximum relevance filter \dQuote{mrmr} computes the
mutual information between the target and each individual feature minus the
average mutual information of previously selected features and this feature
using the \pkg{mRMRe} package.

Filter \dQuote{carscore} determines the \dQuote{Correlation-Adjusted (marginal) coRelation
scores} (short CAR scores). The CAR scores for a set of features are defined as the
correlations between the target and the decorrelated features.

Filter \dQuote{randomForestSRC.rfsrc} computes the importance of random forests
fitted in package \pkg{randomForestSRC}. The concrete method is selected via
the \code{method} parameter. Possible values are \code{permute} (default), \code{random},
\code{anti}, \code{permute.ensemble}, \code{random.ensemble}, \code{anti.ensemble}.
See the VIMP section in the docs for \link[randomForestSRC:rfsrc]{randomForestSRC::rfsrc} for
details.

Filter \dQuote{randomForestSRC.var.select} uses the minimal depth variable
selection proposed by Ishwaran et al. (2010) (\code{method = "md"}) or a
variable hunting approach (\code{method = "vh"} or \code{method = "vh.vimp"}).
The minimal depth measure is the default.

Permutation importance of random forests fitted in package \pkg{party}.
The implementation follows the principle of mean decrese in accuracy used
by the \pkg{randomForest} package (see description of \dQuote{randomForest.importance})
filter.

Filter \dQuote{randomForest.importance} makes use of the \link[randomForest:importance]{randomForest::importance}
from package \pkg{randomForest}. The importance measure to use is selected via
the \code{method} parameter:
\describe{
\item{oob.accuracy}{Permutation of Out of Bag (OOB) data.}
\item{node.impurity}{Total decrease in node impurity.}
}

The absolute Pearson correlation between each feature and the target is used as an indicator of feature importance.
Missing values are not taken into consideration in a pairwise fashion (see \dQuote{pairwise.complete.obs} in \link{cor}).

The absolute Pearson correlation between each feature and the target is used as an indicator of feature importance.
Missing values are not taken into consideration in a pairwise fashion (see \dQuote{pairwise.complete.obs} in \link{cor}).

Filter \dQuote{information.gain} uses the entropy-based information gain
between each feature and target individually as an importance measure.

Filter \dQuote{gain.ratio} uses the entropy-based information gain ratio
between each feature and target individually as an importance measure.

Filter \dQuote{symmetrical.uncertainty} uses the entropy-based symmetrical uncertainty
between each feature and target individually as an importance measure.

The chi-square test is a statistical test of independence to determine whether
two variables are independent. Filter \dQuote{chi.squared} applies this
test in the following way. For each feature the chi-square test statistic is
computed checking if there is a dependency between the feature and the target
variable. Low values of the test statistic indicate a poor relationship. High
values, i.e., high dependency identifies a feature as more important.

Filter \dQuote{relief} is based on the feature selection algorithm \dQuote{ReliefF}
by Kononenko et al., which is a generalization of the orignal \dQuote{Relief}
algorithm originally proposed by Kira and Rendell. Feature weights are initialized
with zeros. Then for each instance \code{sample.size} instances are sampled,
\code{neighbours.count} nearest-hit and nearest-miss neighbours are computed
and the weight vector for each feature is updated based on these values.

Filter \dQuote{oneR} makes use of a simple \dQuote{One-Rule} (OneR) learner to
determine feature importance. For this purpose the OneR learner generates one
simple association rule for each feature in the data individually and computes
the total error. The lower the error value the more important the correspoding
feature.

The \dQuote{univariate.model.score} feature filter resamples an \pkg{mlr}
learner specified via \code{perf.learner} for each feature individually
with randomForest from package \pkg{rpart} being the default learner.
Further parameter are the resamling strategey \code{perf.resampling} and
the performance measure \code{perf.measure}.

Filter \dQuote{anova.test} is based on the Analysis of Variance (ANOVA) between
feature and class. The value of the F-statistic is used as a measure of feature
importance.

Filter \dQuote{kruskal.test} applies a Kruskal-Wallis rank sum test of the
null hypothesis that the location parameters of the distribution of a feature
are the same in each class and considers the test statistic as an variable
importance measure: if the location parameters do not differ in at least one
case, i.e., the null hypothesis cannot be rejected, there is little evidence
that the corresponding feature is suitable for classification.

Simple filter based on the variance of the features indepentent of each other.
Features with higher variance are considered more important than features with
low importance.

Filter \dQuote{permutation.importance} computes a loss function between predictions made by a
learner before and after a feature is permuted. Special arguments to the filter function are
\code{imp.learner}, a (\link{Learner} or \code{character(1)]) which specifies the learner to use when computing the permutation importance,}contrast\code{, a}function\code{which takes two numeric vectors and returns one (default is the difference),}aggregation\code{, a}function\code{which takes a}numeric\code{and returns a}numeric(1)\code{(default is the mean),}nmc\code{, an}integer(1)\code{, and}replace\code{, a}logical(1)` which determines whether the feature being
permuted is sampled with or without replacement.

Filter \dQuote{auc} determines for each feature, how well the target
variable can be predicted only based on this feature. More precisely, the
prediction rule is: class 1 if the feature exceeds a threshold and class 0
otherwise. The performance of this classification rule is measured by the
AUC and the resulting filter score is |0.5 - AUC|.

Filter \dQuote{ranger.permutation} trains a ranger learner with
\dQuote{importance = "permutation"} and assesses the variable
importance for each feature.

Filter \dQuote{ranger.impurity} trains a ranger learner with
\dQuote{importance = "impurity"} and assesses the variable
importance for each feature.
}
\references{
Kira, Kenji and Rendell, Larry (1992). The Feature Selection Problem: Traditional
Methods and a New Algorithm. AAAI-92 Proceedings.

Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms
with RELIEFF (1997), Applied Intelligence, 7(1), p39-55.
}
\seealso{
Other filter: \code{\link{filterFeatures}},
  \code{\link{generateFilterValuesData}},
  \code{\link{getFilterValues}},
  \code{\link{getFilteredFeatures}},
  \code{\link{listFilterMethods}},
  \code{\link{makeFilterWrapper}},
  \code{\link{plotFilterValues}}
}
\keyword{datasets}
