% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dfr_sgl.R
\name{dfr_sgl}
\alias{dfr_sgl}
\title{Fit a DFR-SGL model.}
\usage{
dfr_sgl(
  X,
  y,
  groups,
  type = "linear",
  lambda = "path",
  alpha = 0.95,
  max_iter = 5000,
  backtracking = 0.7,
  max_iter_backtracking = 100,
  tol = 1e-05,
  standardise = "l2",
  intercept = TRUE,
  path_length = 20,
  min_frac = 0.05,
  screen = TRUE,
  verbose = FALSE
)
}
\arguments{
\item{X}{Input matrix of dimensions \eqn{n \times p}{n*p}. Can be a sparse matrix (using class \code{"sparseMatrix"} from the \code{Matrix} package).}

\item{y}{Output vector of dimension \eqn{n}. For \code{type="linear"} should be continuous and for \code{type="logistic"} should be a binary variable.}

\item{groups}{A grouping structure for the input data. Should take the form of a vector of group indices.}

\item{type}{The type of regression to perform. Supported values are: \code{"linear"} and \code{"logistic"}.}

\item{lambda}{The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
\itemize{
\item \code{"path"} computes a path of regularisation parameters of length \code{"path_length"}. The path will begin just above the value at which the first predictor enters the model and will terminate at the value determined by \code{"min_frac"}.
\item User-specified single value or sequence. Internal scaling is applied based on the type of standardisation. The returned \code{"lambda"} value will be the original unscaled value(s).
}}

\item{alpha}{The value of \eqn{\alpha}, which defines the convex balance between the lasso and group lasso. Must be between 0 and 1. Recommended value is 0.95.}

\item{max_iter}{Maximum number of ATOS iterations to perform.}

\item{backtracking}{The backtracking parameter, \eqn{\tau}, as defined in Pedregosa and Gidel (2018).}

\item{max_iter_backtracking}{Maximum number of backtracking line search iterations to perform per global iteration.}

\item{tol}{Convergence tolerance for the stopping criteria.}

\item{standardise}{Type of standardisation to perform on \code{X}:
\itemize{
\item \code{"l2"} standardises the input data to have \eqn{\ell_2} norms of one. When using this \code{"lambda"} is scaled internally by \eqn{1/\sqrt{n}}.
\item \code{"l1"} standardises the input data to have \eqn{\ell_1} norms of one. When using this \code{"lambda"} is scaled internally by \eqn{1/n}.
\item \code{"sd"} standardises the input data to have standard deviation of one.
\item \code{"none"} no standardisation applied.
}}

\item{intercept}{Logical flag for whether to fit an intercept.}

\item{path_length}{The number of \eqn{\lambda} values to fit the model for. If \code{"lambda"} is user-specified, this is ignored.}

\item{min_frac}{Smallest value of \eqn{\lambda} as a fraction of the maximum value. That is, the final \eqn{\lambda} will be \code{"min_frac"} of the first \eqn{\lambda} value.}

\item{screen}{Logical flag for whether to apply the DFR screening rules (see Feser and Evangelou (2024)).}

\item{verbose}{Logical flag for whether to print fitting information.}
}
\value{
A list containing:
\item{beta}{The fitted values from the regression. Taken to be the more stable fit between \code{x} and \code{z}, which is usually the former. A filter is applied to remove very small values, where ATOS has not been able to shrink exactly to zero. Check this against \code{x} and \code{z}.}
\item{x}{The solution to the original problem (see Pedregosa and Gidel (2018)).}
\item{u}{The solution to the dual problem (see Pedregosa and Gidel (2018)).}
\item{z}{The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)).}
\item{type}{Indicates which type of regression was performed.}
\item{lambda}{Value(s) of \eqn{\lambda} used to fit the model.}
\item{success}{Logical flag indicating whether ATOS converged, according to \code{tol}.}
\item{num_it}{Number of iterations performed. If convergence is not reached, this will be \code{max_iter}.}
\item{certificate}{Final value of convergence criteria.}
\item{intercept}{Logical flag indicating whether an intercept was fit.}
}
\description{
Sparse-group lasso (SGL) with DFR main fitting function. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
}
\details{
\code{dfr_sgl()} fits a DFR-SGL model (Feser and Evangelou (2024)) using Adaptive Three Operator Splitting (ATOS) (Pedregosa and Gidel (2018)).
It solves the convex optimisation problem given by (Simon et al. (2013))
\deqn{
  \frac{1}{2n} f(b ; y, \mathbf{X}) + \lambda \alpha \sum_{i=1}^{p} |b_i| + \lambda (1-\alpha)\sum_{g=1}^{m}  \sqrt{p_g} \|b^{(g)}\|_2,
}
where \eqn{f(\cdot)} is the loss function and \eqn{p_g} are the group sizes. In the case of the linear model, the loss function is given by the mean-squared error loss:
\deqn{
 f(b; y, \mathbf{X}) = \left\|y-\mathbf{X}b \right\|_2^2.
}
In the logistic model, the loss function is given by
\deqn{
f(b;y,\mathbf{X})=-1/n \log(\mathcal{L}(b; y, \mathbf{X})).
}
where the log-likelihood is given by
\deqn{
 \mathcal{L}(b; y, \mathbf{X}) = \sum_{i=1}^{n}\left\{y_i b^\intercal x_i - \log(1+\exp(b^\intercal x_i)) \right\}.
}
SGL can be seen to be a convex combination of the lasso and group lasso, balanced through \code{alpha}, such that it reduces to the lasso for \code{alpha = 0} and to the group lasso for \code{alpha = 1}.
By applying both the lasso and group lasso norms, SGL shrinks inactive groups to zero, as well as inactive variables in active groups.
DFR uses the dual norm (the \eqn{\epsilon}-norm) and the KKT conditions to discard features at \eqn{\lambda_k} that would have been inactive at \eqn{\lambda_{k+1}}.
It applies two layers of screening, so that it first screens out any groups that satisfy
\deqn{
\|\nabla_g f(\hat{\beta}(\lambda_{k}))\|_{\epsilon_g} \leq \tau_g(2\lambda_{k+1} - \lambda_k)
}
and then screens out any variables that satisfy
\deqn{
|\nabla_i f(\hat{\beta}(\lambda_{k}))| \leq \alpha (2\lambda_{k+1} - \lambda_k)
}
leading to effective input dimensionality reduction. See Feser and Evangelou (2024) for full details.
}
\examples{
# specify a grouping structure
groups = c(1,1,1,2,2,3,3,3,4,4)
# generate data
data = sgs::gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1)
# run DFR-SGL 
model = dfr_sgl(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, 
alpha=0.95, standardise = "l2", intercept = TRUE, verbose=FALSE)
}
\references{
Feser, F., Evangelou, M. (2024). \emph{Dual feature reduction for the sparse-group lasso and its adaptive variant}, \url{https://arxiv.org/abs/2405.17094}

Pedregosa, F., Gidel, G. (2018). \emph{Adaptive Three Operator Splitting}, \url{https://proceedings.mlr.press/v80/pedregosa18a.html}

Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2013). \emph{A Sparse-Group Lasso}, \url{https://www.tandfonline.com/doi/abs/10.1080/10618600.2012.681250}
}
\seealso{
Other SGL-methods: 
\code{\link{dfr_adap_sgl}()},
\code{\link{dfr_adap_sgl.cv}()},
\code{\link{dfr_sgl.cv}()},
\code{\link{plot.sgl}()},
\code{\link{predict.sgl}()},
\code{\link{print.sgl}()}
}
\concept{SGL-methods}
