% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/BnClustSig.R
\name{uclust}
\alias{uclust}
\title{U-statistic based significance clustering}
\usage{
uclust(md = NULL, data = NULL, alpha = 0.05, rep = 15)
}
\arguments{
\item{md}{Matrix of squared Euclidean distances between all data points.}

\item{data}{Data matrix. Each row represents an observation.}

\item{alpha}{Significance level.}

\item{rep}{Number of times to repeat optimization procedures. Important for problems with
multiple optima.}
}
\value{
Returns a list with the following elements:\describe{
  \item{cluster1}{Elements in group 1 in the final partition. This is the significant partition with
  maximal Bn, if sample is heterogeneous.}
  \item{cluster2}{Elements in group 2 in the final partition.}
  \item{p.value}{P-value for the test that renders the final partition, if heterogeneous.
  Homogeneity test p-value, if homogeneous.}
  \item{alpha_corrected}{Bonferroni corrected significance level for the test that renders the final
  partition, if heterogeneous. Homogeneity test significance level, if homogeneous.}
  \item{n1}{Size of the smallest cluster}
  \item{ishomo}{Logical, returns \code{TRUE} when the sample is homogeneous.}
  \item{Bn}{Value of Bn statistic for the final partition, if heterogeneous.
  Value of Bn statistic for the maximal homogeneity test partition, if homogeneous.}
  \item{varBn}{Variance estimate for final partition, if heterogeneous.
  Variance estimate for the maximal homogeneity test partition, if homogeneous.}
  \item{ishomoResult}{Result of homogeneity test (see \code{is_homo}).}
}
}
\description{
Partitions the sample into the two significant subgroups with the largest Bn statistic. If no significant
partition exists, the test will return "homogeneous".
}
\details{
This is the significance clustering procedure of Valk and Cybis (2018).
The method first performs a homogeneity test to verify whether the data can be significantly
partitioned. If the hypothesis of homogeneity is rejected, then the method will search, among all
the significant partitions, for the partition that better separates the data, as measured by larger
\code{bn} statistic. This function should be used in high dimension small sample size settings.

Either \code{data} or \code{md} should be provided.
If data are entered directly, Bn will be computed considering the squared Euclidean distance.
It is important that if a distance matrix is entered, it consists of squared Euclidean distances, otherwise test results are
invalid.

Variance of \code{bn} is estimated through resampling, and thus, p-values may vary a bit in different runs.

For more detail see Cybis, Gabriela B., Marcio Valk, and Sílvia RC Lopes. "Clustering and classification problems in genetics through U-statistics."
Journal of Statistical Computation and Simulation 88.10 (2018)
and Valk, Marcio, and Gabriela Bettella Cybis. "U-statistical inference for hierarchical clustering." arXiv preprint arXiv:1805.12179 (2018).
See also \code{is_homo}, \code{uhclust}, \code{Utest_class}.
}
\examples{
set.seed(17161)
x = matrix(rnorm(100000),nrow=50)  #creating homogeneous Gaussian dataset
res = uclust(data=x)

x[1:30,] = x[1:30,]+0.25   #Heterogeneous dataset (first 30 samples have different mean)
res = uclust(data=x)

md = as.matrix(dist(x)^2)   #squared Euclidean distances for the same data
res = uclust(md)

# Multidimensional scaling plot of distance matrix
fit <- cmdscale(md, eig = TRUE, k = 2)
x <- fit$points[, 1]
y <- fit$points[, 2]
col=rep(3,dim(md)[1])
col[res$cluster2]=2
plot(x,y, main=paste("Multidimensional scaling plot of data:
                    homogeneity p-value =",res$ishomoResult$p.MaxTest),col=col)


}
