% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/wrappers.R
\name{mlfitppml}
\alias{mlfitppml}
\title{General Penalized PPML Estimation}
\usage{
mlfitppml(
  data,
  dep = 1,
  indep = NULL,
  fixed = NULL,
  cluster = NULL,
  selectobs = NULL,
  ...
)
}
\arguments{
\item{data}{A data frame containing all relevant variables.}

\item{dep}{A string with the name of the independent variable or a column number.}

\item{indep}{A vector with the names or column numbers of the regressors. If left unspecified,
all remaining variables (excluding fixed effects) are included in the regressor matrix.}

\item{fixed}{A vector with the names or column numbers of factor variables identifying the fixed effects,
or a list with the desired interactions between variables in \code{data}.}

\item{cluster}{Optional. A string with the name of the clustering variable or a column number.
It's also possible to input a vector with several variables, in which case the interaction of
all of them is taken as the clustering variable.}

\item{selectobs}{Optional. A vector indicating which observations to use (either a logical vector
or a numeric vector with row numbers, as usual when subsetting in R).}

\item{...}{Further arguments, including:
\itemize{
\item \code{penalty}: A string indicating the penalty type. Currently supported: "lasso" and "ridge".
\item \code{method}: The user can set this equal to "plugin" to perform the plugin algorithm with
coefficient-specific penalty weights (see details). Otherwise, a single global penalty is used.
\item \code{post}: Logical. If \code{TRUE}, estimates a post-penalty regression with the
selected variables.
\item \code{xval}: Logical. If \code{TRUE}, cross-validation is performed using the IDs provided
in the \code{IDs} argument as folds. Note that, by default, observations are assigned
individual IDs, which makes the cross-validation algorithm very time-consuming.
}
For a full list of options, see \link{mlfitppml_int}.}
}
\value{
A list with the following elements:
\itemize{
\item \code{beta}: if \code{post = FALSE}, a \code{length(lambdas)} x \code{ncol(x)} matrix with
coefficient (beta) estimates from the penalized regressions. If \code{post = TRUE}, this is
the matrix of coefficients from the post-penalty regressions.
\item \code{beta_pre}: if \code{post = TRUE}, a \code{length(lambdas)} x \code{ncol(x)} matrix with
coefficient (beta) estimates from the penalized regressions.
\item \code{bic}: Bayesian Information Criterion.
\item \code{lambdas}: vector of penalty parameters.
\item \code{ses}: standard errors of the coefficients of the post-penalty regression. Note that
these are only provided when \code{post = TRUE}.
\item \code{rmse}: if \code{xval = TRUE}, a matrix with the root mean squared error (RMSE - column 2)
for each value of lambda (column 1), obtained by cross-validation.
\item \code{phi}: coefficient-specific penalty weights (only if \code{method == "plugin"}).
}
}
\description{
\code{mlfitppml} is a general-purpose wrapper function for penalized PPML estimation. This is a
flexible tool that allows users to select:
\itemize{
\item Penalty type: either lasso or ridge.
\item Penalty parameter: users can provide a single global value for lambda (a single regression
is estimated), a vector of lambda values (the function estimates the regression using each of them,
sequentially) or even coefficient-specific penalty weights.
\item Method: plugin lasso estimates can be obtained directly from this function too.
\item Cross-validation: if this option is enabled, the function uses IDs provided by the user
to perform k-fold cross-validation and reports the resulting RMSE for all lambda values.
}
}
\details{
This function is a thin wrapper around \code{mlfitppml_int}, providing a more convenient interface for
data frames. Whereas the internal function requires some preliminary handling of data sets (\code{y}
must be a vector, \code{x} must be a matrix and \code{fes} must be provided in a list), the wrapper
takes a full data frame in the \code{data} argument, and users can simply specify which variables
correspond to y, x and the fixed effects, using either variable names or column numbers.

For technical details on the algorithms used, see \link{hdfeppml} (post-lasso regression),
\link{penhdfeppml} (standard penalized regression), \link{penhdfeppml_cluster} (plugin lasso),
and \link{xvalidate} (cross-validation).
}
\section{References}{

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021).
"Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements",
Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional
fixed effects", \emph{STATA Journal}, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables",
\emph{Computational Statistics & Data Analysis}, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear
models via coordinate descent", \emph{Journal of Statistical Software}, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel
models with an application to gun control", \emph{Journal of Business & Economic Statistics}, 34, 590-605.
}

\examples{
# To reduce run time, we keep only countries in the Americas:
americas <- countries$iso[countries$region == "Americas"]
# Now we can use our main functions on the reduced trade data set:
test <- mlfitppml(data = trade[, -(5:6)],
                    dep = "export",
                    fixed = list(c("exp", "time"),
                                 c("imp", "time"),
                                 c("exp", "imp")),
                    selectobs = (trade$imp \%in\% americas) & (trade$exp \%in\% americas),
                    lambdas = c(0.01, 0.001),
                    tol = 1e-6, hdfetol = 1e-2)

}
