% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/autotune_VIM_regrImp.R
\name{autotune_VIM_regrImp}
\alias{autotune_VIM_regrImp}
\title{Perform imputation using VIM package and regressionImp function.}
\usage{
autotune_VIM_regrImp(
  df,
  col_type,
  percent_of_missing,
  col_0_1 = FALSE,
  robust = FALSE,
  mod_cat = FALSE,
  use_imputed = FALSE,
  out_file = NULL
)
}
\arguments{
\item{df}{data.frame. Df to impute with column names and without target column.}

\item{col_type}{Character vector with types of columns.}

\item{percent_of_missing}{numeric vector. Vector contatining percent of missing data in columns for example  c(0,1,0,0,11.3,..)}

\item{col_0_1}{Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset).}

\item{robust}{TRUE/FALSE if robust regression should be used.}

\item{mod_cat}{TRUE/FALSE if TRUE for categorical variables the level with the highest prediction probability is selected, otherwise it is sampled according to the probabilities.}

\item{use_imputed}{TRUE/FALSE if TURE already imputed columns will be used to impute another.}

\item{out_file}{Output log file location if file already exists log message will be added. If NULL no log will be produced.}
}
\value{
Return one data.frame with imputed values.
}
\description{
Function use Regression models to impute missing data.
}
\details{
Function impute one column per iteration to allow more control of imputation. All columns with missing values can be imputed with different formulas. For every new column to imputation one of four formula is used \cr
1. col to impute ~ all columns without missing  \cr
2. col to impute ~ all numeric columns without missing \cr
3. col to impute ~ first of columns without missing \cr
4. col to impute ~ first of numeric columns without missing \cr
For example, if formula 1 and 2 can't be used algorithm will try with formula 3. If all formula can't be used function will be stoped and error form tries with formula 4 or 3 presented. In some case, setting use_imputed on TRUE can solve this problem but in general its lower quality of imputation.
}
\examples{
{
  raw_data <- data.frame(
    a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
    b = as.integer(1:1000),
    c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
    d = runif(1000, 1, 10),
    e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
    f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))

  # Prepering col_type
  col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")

  percent_of_missing <- 1:6
  for (i in percent_of_missing) {
    percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
  }


  imp_data <- autotune_VIM_regrImp(raw_data, col_type, percent_of_missing)

  # Check if all missing value was imputed
  sum(is.na(imp_data)) == 0
  # TRUE
}
}
\references{
Alexander Kowarik, Matthias Templ (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07
}
\author{
{ Alexander Kowarik, Matthias Templ (2016) \doi{10.18637/jss.v074.i07}}
}
