% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/BEM.R
\name{BEM}
\alias{BEM}
\title{BACON-EEM Algorithm for multivariate outlier detection in incomplete
multivariate survey data}
\usage{
BEM(data, weights, v = 2, c0 = 3, alpha = 0.01, md.type = "m",
  em.steps.start = 10, em.steps.loop = 5, better.estimation = FALSE,
  monitor = FALSE)
}
\arguments{
\item{data}{a matrix or data frame. As usual, rows are observations and
columns are variables.}

\item{weights}{a non-negative and non-zero vector of weights for each
observation. Its length must equal the number of rows of the data.
Default is \code{rep(1, nrow(data))}.}

\item{v}{an integer indicating the distance for the definition of the
starting good subset: \code{v = 1} uses the Mahalanobis distance based
on the weighted mean and covariance, \code{v = 2} uses the Euclidean
distance from the componentwise median.}

\item{c0}{the size of initial subset is \code{c0 * ncol(data)}.}

\item{alpha}{a small probability indicating the level \code{(1 - alpha)}
of the cutoff quantile for good observations.}

\item{md.type}{type of Mahalanobis distance: \code{"m"} marginal,
\code{"c"} conditional.}

\item{em.steps.start}{number of iterations of EM-algorithm for starting
good subset.}

\item{em.steps.loop}{number of iterations of EM-algorithm for good subset.}

\item{better.estimation}{if \code{better.estimation = TRUE}, then the
EM-algorithm for the final good subset iterates \code{em.steps.start} more.}

\item{monitor}{if \code{TRUE}, verbose output.}
}
\value{
\code{BEM} returns a list whose first component \code{output} is a
sublist with the following components:
\describe{
  \item{\code{sample.size}}{Number of observations}
  \item{\code{discarded.observations}}{Number of discarded observations}
  \item{\code{number.of.variables}}{Number of variables}
  \item{\code{significance.level}}{The probability used for the cutpoint,
  i.e. \code{alpha}}
  \item{\code{initial.basic.subset.size}}{Size of initial good subset}
  \item{\code{final.basic.subset.size}}{Size of final good subset}
  \item{\code{number.of.iterations}}{Number of iterations of the BACON step}
  \item{\code{computation.time}}{Elapsed computation time}
  \item{\code{center}}{Final estimate of the center}
  \item{\code{scatter}}{Final estimate of the covariance matrix}
  \item{\code{cutpoint}}{The threshold MD-value for the cut-off of outliers}
}
The further components returned by \code{BEM} are:
\describe{
  \item{\code{outind}}{Indicator of outliers}
  \item{\code{dist}}{Final Mahalanobis distances}
}
}
\description{
\code{BEM} starts from a set of uncontaminated data with possible
missing values, applies a version of the EM-algorithm to estimate
the center and scatter of the good data, then adds (or deletes)
observations to the good data which have a Mahalanobis distance
below a threshold. This process iterates until the good data remain
stable. Observations not among the good data are outliers.
}
\details{
The BACON algorithm with \code{v = 1} is not robust but affine equivariant
while \code{v = 1} is robust but not affine equivariant. The threshold for
the (squared) Mahalanobis distances, beyond which an observation is an
outlier, is a standardised chisquare quantile at \code{(1 - alpha)}. For
large data sets it may be better to choose \code{alpha / n} instead. The
internal function \code{EM.normal} is usually called from \code{BEM}.
\code{EM.normal} is implementing the EM-algorithm in such a way that
part of the calculations can be saved to be reused in the \code{BEM}
algorithm. \code{EM.normal} does not contain the computation of the
observed sufficient statistics, they will be computed in the main
program of \code{BEM} and passed as parameters as well as the statistics
on the missingness patterns.
}
\note{
\code{BEM} uses an adapted version of the EM-algorithm in function
\code{.EM-normal}.
}
\examples{
# Bushfire data set with 20\% MCAR
data(bushfirem, bushfire.weights)
bem.res <- BEM(bushfirem, bushfire.weights, alpha = (1 - 0.01 / nrow(bushfirem)))
print(bem.res$output)
}
\references{
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for
Multivariate Outlier Detection in Incomplete Survey Data, Survey Methodology,
Vol. 34, No. 1, pp. 91-103.

Billor, N., Hadi, A.S. and Vellemann, P.F. (2000). BACON: Blocked Adaptative
Computationally-efficient Outlier Nominators. Computational Statistics and
Data Analysis, 34(3), 279-298.

Schafer J.L. (2000), Analysis of Incomplete Multivariate Data, Monographs on
Statistics and Applied Probability 72, Chapman & Hall.
}
\author{
Beat Hulliger
}
