% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/iClusterVB.R
\name{iClusterVB}
\alias{iClusterVB}
\title{Fast Integrative Clustering for High-Dimensional Multi-View Data Using
Variational Bayesian Inference}
\usage{
iClusterVB(
  mydata,
  dist,
  K = 10,
  initial_method = "VarSelLCM",
  VS_method = 0,
  initial_cluster = NULL,
  initial_vs_prob = NULL,
  initial_fit = NULL,
  initial_omega = NULL,
  input_hyper_parameters = NULL,
  max_iter = 200,
  early_stop = 1,
  per = 10,
  convergence_threshold = 1e-04
)
}
\arguments{
\item{mydata}{A list of length R, where R is the number of datasets,
containing the input data.
\itemize{
\item Note: For \bold{categorical} data, \code{0}'s must be re-coded to
another, non-\code{0} value.
}}

\item{dist}{A vector of length R specifying the type of data or distribution.
Options include: 'gaussian' (for continuous data), 'multinomial' (for
binary or categorical data), and 'poisson' (for count data).}

\item{K}{The maximum number of clusters, with a default value of 10. The
algorithm will converge to a model with dominant clusters, removing
redundant clusters and automating the determination of the number of
clusters.}

\item{initial_method}{The initialization method for cluster allocation.
Options include: "VarSelLCM" (default), "random", "kproto" (k-prototypes),
"kmeans" (continuous data only), "mclust" (continuous data only), or "lca"
(poLCA, categorical data only).}

\item{VS_method}{The variable/feature selection method. Options are 0 for
clustering without variable/feature selection (default) and 1 for
clustering with variable/feature selection.}

\item{initial_cluster}{The initial cluster membership. The default is NULL,
which uses initial_method for initial cluster allocation. If not NULL, it
will override the initial values setting for this parameter.}

\item{initial_vs_prob}{The initial variable/feature selection probability, a
scalar. The default is NULL, which assigns a value of 0.5.}

\item{initial_fit}{Initial values based on a previously fitted iClusterVB
model (an iClusterVB object). The default is NULL.}

\item{initial_omega}{Customized initial values for feature inclusion
probabilities. The default is NULL. If not NULL, it will override the
initial values setting for this parameter. If VS_method = 1, initial_omega
is a list of length R, with each element being an array with dimensions
\{dim=c(N, p[[r]])\}. Here, N is the sample size and p[[r]] is the
number of features for dataset r, where r = 1, ..., R.}

\item{input_hyper_parameters}{A list of the initial hyper-parameters of the
prior distributions for the model. The default is NULL, which assigns
alpha_00 = 0.001, mu_00 = 0, s2_00 = 100, a_00 = 1, b_00 = 1,kappa_00 = 1,
u_00 = 1, v_00 = 1.}

\item{max_iter}{The maximum number of iterations for the VB algorithm. The
default is 200.}

\item{early_stop}{Whether to stop the algorithm upon convergence or to
continue until \code{max_iter} is reached. Options are 1 (default) to stop
when the algorithm converges, and 0 to stop only when \code{max_iter} is
reached.}

\item{per}{Print information every "per" iterations. The default is 10.}

\item{convergence_threshold}{The convergence threshold for the change in
ELBO. The default is 0.0001.}
}
\value{
The \code{iClusterVB} function creates an object (list) of class
\code{iClusterVB}. Relevant outputs include:

\item{\code{elbo}:}{ The evidence lower bound for each iteration.}
\item{\code{cluster}:}{ The cluster assigned to each individual.}
\item{\code{initial_values}:}{ A list of the initial values.}
\item{\code{hyper_parameters}:}{ A list of the hyper-parameters.}
\item{\code{model_parameters}:}{A list of the model parameters after the
algorithm is run.}
\itemize{
\item Of particular interest is \code{rho}, a list of the posterior
inclusion probabilities for the features in each of the data views. This is
the probability of including a certain predictor in the model, given the
observations. This is only available if \code{VS_method = 1}.
}
}
\description{
\code{iClusterVB} offers a novel, fast, and integrative approach to
clustering high-dimensional, mixed-type, and multi-view data. By employing
variational Bayesian inference, iClusterVB facilitates effective feature
selection and identification of disease subtypes, enhancing clinical
decision-making.
}
\note{
If any of the data views are "gaussian", please include them
\bold{first}, both in the input data \code{mydata} and correspondingly in
the
distribution vector \code{dist}. For example, \code{dist <-
  c("gaussian","gaussian", "poisson", "multinomial")}, and \bold{not}
\code{dist <- c("poisson", "gaussian","gaussian", "multinomial")} or
\code{dist <- c("gaussian", "poisson", "gaussian", "multinomial")}
}
\examples{
# sim_data comes with the iClusterVB package.
dat1 <- list(
  gauss_1 = sim_data$continuous1_data[c(1:20, 61:80, 121:140, 181:200), 1:75],
  gauss_2 = sim_data$continuous2_data[c(1:20, 61:80, 121:140, 181:200), 1:75],
  poisson_1 = sim_data$count_data[c(1:20, 61:80, 121:140, 181:200), 1:75],
  multinomial_1 = sim_data$binary_data[c(1:20, 61:80, 121:140, 181:200), 1:75]
)

# We re-code `0`s to `2`s

dat1$multinomial_1[dat1$multinomial_1 == 0] <- 2

dist <- c(
  "gaussian", "gaussian",
  "poisson", "multinomial"
)

# Note: `max_iter` is a time-intensive step.
# For the purpose of testing the code, use a small value (e.g. 10).
# For more accurate results, use a larger value (e.g. 200).

fit_iClusterVB <- iClusterVB(
  mydata = dat1,
  dist = dist,
  K = 4,
  initial_method = "VarSelLCM",
  VS_method = 1,
  max_iter = 50
)

# We can obtain a summary using the summary() function
summary(fit_iClusterVB)

}
