% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/TomekLinks.R
\name{TomekLinks}
\alias{TomekLinks}
\alias{TomekLinks.default}
\alias{TomekLinks.formula}
\title{TomekLinks}
\usage{
\method{TomekLinks}{formula}(formula, data, ...)

\method{TomekLinks}{default}(x, classColumn = ncol(x), ...)
}
\arguments{
\item{formula}{A formula describing the classification variable and the attributes to be used.}

\item{data, x}{Data frame containing the tranining dataset to be filtered.}

\item{...}{Optional parameters to be passed to other methods.}

\item{classColumn}{positive integer indicating the column which contains the
(factor of) classes. By default, the last column is considered.}
}
\value{
An object of class \code{filter}, which is a list with seven components:
\itemize{
   \item \code{cleanData} is a data frame containing the filtered dataset.
   \item \code{remIdx} is a vector of integers indicating the indexes for
   removed instances (i.e. their row number with respect to the original data frame).
   \item \code{repIdx} is a vector of integers indicating the indexes for
   repaired/relabelled instances (i.e. their row number with respect to the original data frame).
   \item \code{repLab} is a factor containing the new labels for repaired instances.
   \item \code{parameters} is a list containing the argument values.
   \item \code{call} contains the original call to the filter.
   \item \code{extraInf} is a character that includes additional interesting
   information not covered by previous items.
}
}
\description{
Similarity-based filter for removing label noise from a dataset as a
preprocessing step of classification. For more information, see 'Details' and
'References' sections.
}
\details{
The function \code{TomekLinks} removes "TomekLink points" from the dataset. These are introduced
in [Tomek, 1976], and are expected to lie on the border between classes.
Removing such points is a typical procedure for cleaning noise [Lorena, 2002].

Since the computation of mean points is necessary for TomekLinks, only numeric attributes are allowed.
Moreover, only two different classes are allowed to detect TomekLinks.
}
\examples{
# Next code fails since TomekLinks method is designed for two-class problems.
# Some decomposition strategy like OVO or OVA could be used to overcome this.
\dontrun{
data(iris)
out <- TomekLinks(Species~., data = iris)
}
}
\references{
Tomek I. (Nov. 1976): Two modifications of CNN, \emph{IEEE Trans. Syst., Man, Cybern.}, vol. 6, no. 11, pp. 769-772.

Lorena A. C., Batista G. E. A. P. A., de Carvalho A. C. P. L. F., Monard M. C. (Nov. 2002): The influence of noisy patterns in the performance of learning methods in the splice junction recognition problem, in \emph{Proc. 7th Brazilian Symp. Neural Netw.}, Recife, Brazil, pp. 31-37.
}

