% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lasso_bic.R
\name{lasso_bic}
\alias{lasso_bic}
\title{fit a lasso regression and use standard BIC for variable selection}
\usage{
lasso_bic(x, y, maxp = 50, path = TRUE, betaPos = TRUE, ...)
}
\arguments{
\item{x}{Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class
\code{"sparseMatrix"} as in package \code{Matrix}).}

\item{y}{Binary response variable, numeric.}

\item{maxp}{A limit on how many relaxed coefficients are allowed.
Default is 50, in \code{glmnet} option default is 'n-3', where 'n' is the sample size.}

\item{path}{Since \code{glmnet} does not do stepsize optimization, the Newton
algorithm can get stuck and not converge, especially with relaxed fits. With \code{path=TRUE},
each relaxed fit on a particular set of variables is computed pathwise using the original sequence
of lambda values (with a zero attached to the end). Default is \code{path=TRUE}.}

\item{betaPos}{Should the covariates selected by the procedure be
positively associated with the outcome ? Default is \code{TRUE}.}

\item{\dots}{Other arguments that can be passed to \code{glmnet} from package
\code{glmnet} other than \code{penalty.factor}, \code{family}, \code{maxp}
and \code{path}.}
}
\value{
An object with S3 class \code{"log.lasso"}.
\item{beta}{Numeric vector of regression coefficients in the lasso.
In \code{lasso_bic} function, the regression coefficients are UNPENALIZED.
Length equal to nvars.}
\item{selected_variables}{Character vector, names of variable(s) selected with the
lasso-bic approach.
If \code{betaPos = TRUE}, this set is the covariates with a positive regression
coefficient in \code{beta}.
Else this set is the covariates with a non null regression coefficient in \code{beta}.
Covariates are ordering according to the p-values (two-sided if \code{betaPos = FALSE} ,
one-sided if \code{betaPos = TRUE}) in the classical multiple logistic regression
model that minimzes the BIC.}
}
\description{
Fit a lasso regression and use the Bayesian Information Criterion (BIC)
to select a subset of selected covariates.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option \code{family = "binomial"} is forced).
Depends on the \code{glmnet} and \code{relax.glmnet} functions from the package \code{glmnet}.
}
\details{
For each tested penalisation parameter \eqn{\lambda}, a standard version of the BIC
is implemented.
\deqn{BIC_\lambda = - 2 l_\lambda + df(\lambda) * ln (N)}
where \eqn{l_\lambda} is the log-likelihood of the non-penalized multiple logistic
regression model that includes the set of covariates with a non-zero coefficient
in the penalised regression coefficient vector associated to \eqn{\lambda},
and  \eqn{df(\lambda)} is the number of covariates with a non-zero coefficient
in the penalised regression coefficient vector associated to \eqn{\lambda},
The optimal set of covariates according to this approach is the one associated with
the classical multiple logistic regression model which minimizes the BIC.
}
\examples{

set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
lb <- lasso_bic(x = drugs, y = ae, maxp = 20)


}
\author{
Emeline Courtois \cr Maintainer: Emeline Courtois
\email{emeline.courtois@inserm.fr}
}
