% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Projection_rf.R
\name{Projection_rf}
\alias{Projection_rf}
\title{Projection_rf}
\usage{
Projection_rf(
  data_model,
  target_column,
  data_proj,
  domain1,
  domain2,
  psu,
  ssu,
  strata,
  weights,
  split_ratio = 0.8,
  metric = "Accuracy"
)
}
\arguments{
\item{data_model}{The training dataset, consisting of auxiliary variables and the target variable.}

\item{target_column}{The name of the target column in the \code{data_model}.}

\item{data_proj}{The data for projection (prediction), which needs to be projected using the trained model. It must contain the same auxiliary variables as the \code{data_model}}

\item{domain1}{Domain variables for survey estimation (e.g., "province")}

\item{domain2}{Domain variables for survey estimation (e.g., "regency")}

\item{psu}{Primary sampling units, representing the structure of the sampling frame from \code{data_proj}.}

\item{ssu}{Secondary sampling units, representing the structure of the sampling frame from \code{data_proj}.}

\item{strata}{Stratification variable in the \code{data_proj}, ensuring that specific subgroups are represented.}

\item{weights}{Weights used in the \code{data_proj} for indirect estimation.}

\item{split_ratio}{Proportion of data used for training (default is 0.8, meaning 80\% for training and 20\% for validation).}

\item{metric}{The metric used for model evaluation (default is Accuracy, other options include "AUC", "F1", etc.).}
}
\value{
A list containing the following elements:
\itemize{
\item \code{model} The trained Random Forest model.
\item \code{importance} Feature importance showing which features contributed most to the model’s predictions.
\item \code{train_accuracy} Accuracy of the model on the training set.
\item \code{validation_accuracy} Accuracy of the model on the validation set.
\item \code{validation_performance} Confusion matrix for the validation set, showing performance metrics like accuracy, precision, recall, etc.
\item \code{data_proj} The projection data with predicted values.
\item \code{Domain1} Estimations for Domain 1, including estimated values, variance, and relative standard error.
\item \code{Domain2} Estimations for Domain 2, including estimated values, variance, and relative standard error.
}
}
\description{
\strong{Kim and Rao (2012)}, the synthetic data obtained through the model-assisted projection method can provide a useful tool for
efficient domain estimation when the size of the sample in survey 2 is much larger than the size of sample in survey 1.

This function projects estimated values from a small survey onto an independent large survey using the random forest algorithm.
Although the two surveys are statistically independent, the projection relies on shared auxiliary variables.
The process includes data preprocessing, feature selection, model training, and domain-specific estimation based on survey design principles.
}
\examples{
\donttest{
library(survey)
library(caret)
library(dplyr)
library(themis)
library(randomForest)

df_sep20 <- df_susenas_sep2020 \%>\% select(-c(1:6))
df_mar20 <- df_susenas_mar2020

proj_rf <- Projection_rf(data_model = df_sep20,
                          target_column = "uses_public_transport",
                          data_proj = df_mar20,
                          domain1 = "province",
                          domain2 = "regency",
                          psu = "psu",
                          ssu = "ssu",
                          strata = "strata",
                          weights = "weight",
                          metric = "Accuracy")
}

}
\references{
\enumerate{
\item Kim, J. K., & Rao, J. N. (2012). Combining data from two independent surveys: a model-assisted approach. Biometrika, 99(1), 85-100.
}
}
