% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/obc_milp.R
\name{ob_categorical_milp}
\alias{ob_categorical_milp}
\title{Optimal Binning for Categorical Variables using Heuristic Algorithm}
\usage{
ob_categorical_milp(
  feature,
  target,
  min_bins = 3L,
  max_bins = 5L,
  bin_cutoff = 0.05,
  max_n_prebins = 20L,
  bin_separator = "\%;\%",
  convergence_threshold = 1e-06,
  max_iterations = 1000L
)
}
\arguments{
\item{feature}{A character vector or factor representing the categorical
predictor variable. Missing values (NA) will be converted to the string
"NA" and treated as a separate category.}

\item{target}{An integer vector containing binary outcome values (0 or 1).
Must be the same length as \code{feature}. Cannot contain missing values.}

\item{min_bins}{Integer. Minimum number of bins to create. Must be at least
2. Default is 3.}

\item{max_bins}{Integer. Maximum number of bins to create. Must be greater
than or equal to \code{min_bins}. Default is 5.}

\item{bin_cutoff}{Numeric. Minimum relative frequency threshold for
individual categories. Categories with frequency below this proportion
will be merged with others. Value must be between 0 and 1. Default is
0.05 (5\%).}

\item{max_n_prebins}{Integer. Maximum number of initial bins before
optimization. Used to control computational complexity when dealing with
high-cardinality categorical variables. Default is 20.}

\item{bin_separator}{Character string used to separate category names when
multiple categories are merged into a single bin. Default is "\%;\%".}

\item{convergence_threshold}{Numeric. Threshold for determining algorithm
convergence based on changes in total Information Value. Must be positive.
Default is 1e-6.}

\item{max_iterations}{Integer. Maximum number of iterations for the
optimization process. Must be positive. Default is 1000.}
}
\value{
A list containing the results of the optimal binning procedure:
\describe{
  \item{\code{id}}{Integer vector of bin identifiers (1 to n_bins)}
  \item{\code{bin}}{Character vector of bin labels, which are combinations
        of original categories separated by \code{bin_separator}}
  \item{\code{woe}}{Numeric vector of Weight of Evidence values for each bin}
  \item{\code{iv}}{Numeric vector of Information Values for each bin}
  \item{\code{count}}{Integer vector of total observations in each bin}
  \item{\code{count_pos}}{Integer vector of positive outcomes in each bin}
  \item{\code{count_neg}}{Integer vector of negative outcomes in each bin}
  \item{\code{total_iv}}{Numeric scalar. Total Information Value across all
        bins}
  \item{\code{converged}}{Logical. Whether the algorithm converged within
        the specified tolerance}
  \item{\code{iterations}}{Integer. Number of iterations performed}
}
}
\description{
This function performs optimal binning for categorical variables using a
heuristic merging approach to maximize Information Value (IV) while
maintaining monotonic Weight of Evidence (WoE). Despite its name containing
"MILP", it does NOT use Mixed Integer Linear Programming but rather a greedy
optimization algorithm.
}
\details{
The algorithm follows these steps:
\enumerate{
  \item Pre-binning: Each unique category becomes an initial bin
  \item Rare category handling: Categories below \code{bin_cutoff} frequency
        are merged with similar ones
  \item Bin reduction: Greedily merge bins to satisfy \code{min_bins} and
        \code{max_bins} constraints
  \item Monotonicity enforcement: Ensures WoE is either consistently
        increasing or decreasing across bins
  \item Optimization: Iteratively improves Information Value
}

Key features include:
\itemize{
  \item Bayesian smoothing to stabilize WoE estimates for sparse categories
  \item Automatic handling of missing values (converted to "NA" category)
  \item Monotonicity constraint enforcement
  \item Configurable minimum and maximum bin counts
  \item Rare category pooling based on relative frequency thresholds
}

Mathematical definitions:
\deqn{WoE_i = \ln\left(\frac{p_i^{(1)}}{p_i^{(0)}}\right)}{
WoE_i = ln((p_i^(1))/(p_i^(0)))}
where \eqn{p_i^{(1)}}{p_i^(1)} and \eqn{p_i^{(0)}}{p_i^(0)} are the
proportions of positive and negative cases in bin \eqn{i}, respectively,
adjusted using Bayesian smoothing.

\deqn{IV = \sum_{i=1}^{n} (p_i^{(1)} - p_i^{(0)}) \times WoE_i}{
IV = sum((p_i^(1) - p_i^(0)) * WoE_i)}
}
\note{
\itemize{
  \item Target variable must contain both 0 and 1 values.
  \item Empty strings in the feature vector are not allowed and will cause
        an error.
  \item For datasets with very few observations in either class (<5),
        warnings will be issued as results may be unstable.
  \item The algorithm uses a greedy heuristic approach, not true MILP
        optimization. For exact solutions, external solvers like Gurobi or
        CPLEX would be required.
}
}
\examples{
# Generate sample data
set.seed(123)
n <- 1000
feature <- sample(letters[1:8], n, replace = TRUE)
target <- rbinom(n, 1, prob = ifelse(feature \%in\% c("a", "b"), 0.7, 0.3))

# Perform optimal binning
result <- ob_categorical_milp(feature, target)
print(result[c("bin", "woe", "iv", "count")])

# With custom parameters
result2 <- ob_categorical_milp(
  feature = feature,
  target = target,
  min_bins = 2,
  max_bins = 4,
  bin_cutoff = 0.03
)

# Handling missing values
feature_with_na <- feature
feature_with_na[sample(length(feature_with_na), 50)] <- NA
result3 <- ob_categorical_milp(feature_with_na, target)

}
