% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/fill_gaps.R
\name{fill_gaps}
\alias{fill_gaps}
\title{Prepare a dataset for modeling by filling in temporal gaps in data collection}
\usage{
fill_gaps(data, date_col = 1, frequency, groups = NULL, static_features = NULL)
}
\arguments{
\item{data}{A data.frame or object coercible to a data.frame with, minimally, dates and the outcome being forecasted.}

\item{date_col}{The column index--an integer--of the date index. This column should have class 'Date' or 'POSIXt'.}

\item{frequency}{Date/time frequency. A string taking the same input as \code{base::seq.Date(..., by = "frequency")}
or \code{base::seq.POSIXt..., by = "frequency")} e.g., '1 hour', '1 month', '7 days', '10 years' etc.
The highest frequency supported at present is '1 sec'.}

\item{groups}{Optional. A character vector of column names that identify the unique time series (i.e., groups/hierarchies)
when multiple time series are present.}

\item{static_features}{Optional. For grouped time series only. A character vector of column names that identify features that do not change through time.
These columns are expected to be used as model features but are not lagged (e.g., a ZIP code column). The most recent values for each
static feature for each group are used to fill in the resulting missing data in static features when new rows are
added to the dataset.}
}
\value{
An object of class 'data.frame': The returned data.frame has the same number of columns and column order but
with additional rows to account for gaps in data collection. For grouped data, any new rows added to the returned data.frame will appear
between the minimum--or oldest--date for that group and the maximum--or most recent--date across all groups. If the user-supplied
forecasting algorithm(s) cannot handle missing outcome values or missing dynamic features, these should either be
imputed prior to \code{create_lagged_df()} or filtered out in the user-supplied modeling function for \code{\link{train_model}}.
}
\description{
In order to create a modeling dataset with feature lags that are temporally correct, the entry
function in \code{forecastML}, \code{\link{create_lagged_df}}, needs evenly-spaced time series with no
gaps in data collection. \code{fill_gaps()} can help here.
This function takes a \code{data.frame} with (a) dates, (b) the outcome being forecasted, and, optionally,
(c) dynamic features that change through time, (d) group columns for multiple time series modeling,
and (e) static or non-dynamic features for multiple time series modeling and returns a \code{data.frame}
with rows evenly spaced in time. Specifically, this function adds rows to the input dataset
while filling in (a) dates, (b) grouping information, and (c) static features. The (a) outcome and (b)
dynamic features will be \code{NA} for any missing time periods; these \code{NA} values can be left
as-is, user-imputed, or removed from modeling in the user-supplied modeling wrapper function for \code{\link{train_model}}.
}
\section{Methods and related functions}{


The output of \code{fill_gaps()} is passed into

\itemize{
  \item \code{\link{create_lagged_df}}
}
}

\examples{
# NOAA buoy dataset with gaps in data collection
data("data_buoy_gaps", package = "forecastML")

data_buoy_no_gaps <- fill_gaps(data_buoy_gaps, date_col = 1, frequency = '1 day',
                               groups = 'buoy_id', static_features = c('lat', 'lon'))

# The returned data.frame has the same number of columns but the time-series
# are now evenly spaced at 1 day apart. Additionally, the unchanging grouping
# columns and static features columns have been filled in for the newly created dataset rows.
dim(data_buoy_gaps)
dim(data_buoy_no_gaps)

# Running create_lagged_df() is the next step in the forecastML forecasting
# process. If there are long gaps in data collection, like in this buoy dataset,
# and the user-supplied modeling algorithm cannot handle missing outcomes data,
# the best option is to filter these rows out in the user-supplied modeling function
# for train_model()
}
