% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/IPF.R
\name{IPF}
\alias{IPF}
\alias{IPF.default}
\alias{IPF.formula}
\title{Iterative Partitioning Filter}
\usage{
\method{IPF}{formula}(formula, data, ...)

\method{IPF}{default}(x, nfolds = 5, consensus = FALSE, p = 0.01, s = 3,
  y = 0.5, classColumn = ncol(x), ...)
}
\arguments{
\item{formula}{A formula describing the classification variable and the attributes to be used.}

\item{data, x}{Data frame containing the tranining dataset to be filtered.}

\item{...}{Optional parameters to be passed to other methods.}

\item{nfolds}{Number of partitions in each iteration.}

\item{consensus}{Logical. If FALSE, majority voting scheme is used. If TRUE, consensus
voting scheme is applied.}

\item{p}{Real number between 0 and 1. It sets the minimum proportion of original
instances which must be tagged as noisy in order to go for another iteration.}

\item{s}{Positive integer setting the stop criterion together with \code{p}. The filter stops
after \code{s} iterations with not enough noisy instances removed (according to the proportion \code{p}, see the 'Details' ).}

\item{y}{Real number between 0 and 1. It sets the proportion of good instances which
must be stored in each iteration.}

\item{classColumn}{Positive integer indicating the column which contains the (factor of) classes.
By default, the last column is considered.}
}
\value{
An object of class \code{filter}, which is a list with seven components:
\itemize{
   \item \code{cleanData} is a data frame containing the filtered dataset.
   \item \code{remIdx} is a vector of integers indicating the indexes for
   removed instances (i.e. their row number with respect to the original data frame).
   \item \code{repIdx} is a vector of integers indicating the indexes for
   repaired/relabelled instances (i.e. their row number with respect to the original data frame).
   \item \code{repLab} is a factor containing the new labels for repaired instances.
   \item \code{parameters} is a list containing the argument values.
   \item \code{call} contains the original call to the filter.
   \item \code{extraInf} is a character that includes additional interesting
   information not covered by previous items.
}
}
\description{
Ensemble-based filter for removing label noise from a dataset as a
preprocessing step of classification. For more information, see 'Details' and
'References' sections.
}
\details{
The full description of the method can be looked up in the provided references.
A base classifier is built in each of the \code{nfolds} partitions of \code{data}. Then, they are
tested in the whole dataset, and the removal of noisy instances is decided via consensus or
majority voting schemes. Finally, a proportion of good instances (i.e. those whose label agrees
with all the base classifiers) is stored and removed for the next iteration. The process stops
after \code{s} iterations with not enough (according to the proportion \code{p}) noisy
instances removed. In this implementation, the base classifier used is C4.5.
}
\note{
By means of a message, the number of noisy instances removed
in each iteration is displayed in the console.
}
\examples{
# Next example is not run in order to save time
\dontrun{
data(iris)
# We fix a seed since there exists a random folds partition for the ensemble
set.seed(1)
out <- IPF(Species~., data = iris, s = 2)
summary(out, explicit = TRUE)
identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])
}
}
\references{
Khoshgoftaar T. M., Rebours P. (2007): Improving software quality prediction by
noise filtering techniques. \emph{Journal of Computer Science and Technology}, 22(3), 387-396.

Zhu X., Wu X., Chen Q. (2003, August): Eliminating class noise in large
datasets. \emph{International Conference in Machine Learning} (Vol. 3, pp. 920-927).
}

