% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/aggregate_applications.R
\name{aggregate_applications}
\alias{aggregate_applications}
\title{Aggregate Numeric Data by Periods}
\usage{
aggregate_applications(
  data,
  id_col,
  amount_col,
  time_col = NULL,
  group_cols = NULL,
  ops,
  period,
  observation_window_start_col = NULL,
  scrape_date_col = NULL,
  period_agg = sum,
  period_missing_inputs = 0
)
}
\arguments{
\item{data}{A data frame containing the data to be aggregated. The dataset must include at least the columns specified by
\code{id_col}, \code{time_col}, and \code{amount_col} (or any numeric variable to aggregate).}

\item{id_col}{A character string specifying the column name used to define the aggregation level (e.g., \code{"application_id"},
\code{"client_id"}, or \code{"agreement_id"}).}

\item{amount_col}{A character string specifying the column in \code{data} that contains the numeric variable to be aggregated.
This variable can represent transaction amounts, loan repayment values, credit bureau inquiry counts, or any other numeric measure.}

\item{time_col}{A character string indicating the column name that contains the date (or timestamp) when the event occurred.
This column must be of class \code{Date} or \code{POSIXct}.}

\item{group_cols}{An optional character vector of column names by which to further subdivide the aggregation.
For each unique value in these columns, separate summary features will be generated and appended as new columns.}

\item{ops}{A named list of functions used to compute summary features on the aggregated period values.
Each function must accept a single numeric vector as input. The names of the list elements are used to label the output columns.}

\item{period}{Either a character string specifying the time period grouping (\code{"daily"}, \code{"weekly"}, \code{"monthly"}, or \code{"all"})
or a numeric vector of length 2 (e.g., \code{c(7, 8)}) where the first element is the cycle length in days and the second is the
number of consecutive cycles. When set to \code{"all"}, the entire set of observations is aggregated as a single period,
effectively disabling time aggregation.}

\item{observation_window_start_col}{A character string indicating the column name that contains the observation window start date.
This argument is required when \code{period} is specified as a character string other than \code{"all"}.}

\item{scrape_date_col}{A character string indicating the column name that contains the scrape date (i.e., the end date for the observation
window). This is required when \code{period} is specified as a character string other than \code{"all"} or as a numeric vector.}

\item{period_agg}{A function used to aggregate the numeric values within each period. The default is \code{sum}. The argument is ignored if \code{period} is \code{"all"}.}

\item{period_missing_inputs}{A numeric constant used to replace missing values in periods with no observed data. The default value is \code{0}.}
}
\value{
A data frame where each row corresponds to a unique identifier (e.g., application, client, or agreement).
  The output includes aggregated summary features for each period and, if applicable, additional columns for each group
  defined in \code{group_cols}.
}
\description{
Aggregates any numeric variable(s) in a dataset over defined time periods and returns summary features
computed from provided operation functions. E.g., aggregating and making features from transactional data,
previous loan repayment behavior, credit bureau inquiries. Aggregation is performed by a specified grouping
identifier (e.g., application, client, or agreement level) and is based on time-periods.
}
\details{
When \code{period} is provided as a character string (one of \code{"daily"}, \code{"weekly"}, or \code{"monthly"}),
data are grouped into complete calendar periods. For example, if the scrape date falls mid-month, the incomplete last period
is excluded. Alternatively, \code{period} may be specified as a numeric vector of length 2 (e.g., \code{c(7, 8)}), in which case
the first element defines the cycle length in days and the second element the number of consecutive cycles. In this example,
if the scrape date is \code{"2024-12-31"}, the periods span the last 56 days (8 consecutive 7-day cycles), with the first period
starting on \code{"2024-11-05"}.


\code{aggregate_applications} aggregates numeric data either by defined time periods or over the full observation window.
Data is first grouped by the identifier specified in \code{id_col} (e.g., at the application, client, or agreement level).

\enumerate{
  \item When \code{period} is set to \code{"daily"}, \code{"weekly"}, or \code{"monthly"},
        transaction dates in \code{time_col} are partitioned into complete calendar periods (incomplete periods are excluded).
  \item When \code{period} is set to a numeric vector of length 2 (e.g., \code{c(7, 8)}), consecutive cycles of fixed length are defined.
  \item When \code{period} is set to \code{"all"}, time aggregation is disabled. All observations for an identifier (or group)
        are aggregated together.
}

For each period, the numeric values in \code{amount_col} (or any other numeric measure) are aggregated using the function
specified by \code{period_agg}. Then, for each unique group (if any \code{group_cols} are provided) and for each application (or
other identifier), the summary functions specified in \code{ops} are applied to the vector of aggregated period values.
When grouping is used, the resulting summary features are appended as new columns with names constructed in the format:
\code{<operation>_<group_column>_<group_value>}. Missing aggregated values in periods with no observations are replaced
by \code{period_missing_inputs}.
}
\examples{
data(featForge_transactions)

# Example 1: Aggregate outgoing transactions (amount < 0) on a monthly basis.
aggregate_applications(featForge_transactions[featForge_transactions$amount < 0, ],
                       id_col = 'application_id',
                       amount_col = 'amount',
                       time_col = 'transaction_date',
                       ops = list(
                         avg_momnthly_outgoing_transactions = mean,
                         last_month_transactions_amount = function(x) x[length(x)],
# In the aggregated numeric vector, the last observation represents the most recent period.
                         last_month_transaction_amount_vs_mean = function(x) x[length(x)] / mean(x)
                       ),
                       period = 'monthly',
                       observation_window_start_col = 'obs_start',
                       scrape_date_col = 'scrape_date'
)

# Example 2: Aggregate transactions by category and direction.
featForge_transactions$direction <- ifelse(featForge_transactions$amount > 0, 'in', 'out')
aggregate_applications(featForge_transactions,
                       id_col = 'application_id',
                       amount_col = 'amount',
                       time_col = 'transaction_date',
                       group_cols = c('category', 'direction'),
                       ops = list(
                         avg_monthly_transactions = mean,
                         highest_monthly_transactions_count = max
                       ),
                       period = 'monthly',
                       period_agg = length,
                       observation_window_start_col = 'obs_start',
                       scrape_date_col = 'scrape_date'
)

# Example 3: Aggregate using a custom numeric period:
# 30-day cycles for 3 consecutive cycles (i.e., the last 90 days).
aggregate_applications(featForge_transactions,
                       id_col = 'application_id',
                       amount_col = 'amount',
                       time_col = 'transaction_date',
                       ops = list(
                         avg_30_day_transaction_count_last_90_days = mean
                       ),
                       period = c(30, 3),
                       period_agg = length,
                       observation_window_start_col = 'obs_start',
                       scrape_date_col = 'scrape_date'
)

# Example 4: Aggregate transactions without time segmentation.
aggregate_applications(featForge_transactions,
                       id_col = 'application_id',
                       amount_col = 'amount',
                       ops = list(
                         total_transactions_counted = length,
                         total_outgoing_transactions_counted = function(x) sum(x < 0),
                         total_incoming_transactions_counted = function(x) sum(x > 0)
                       ),
                       period = 'all'
)

}
