The futurize package allows you to easily turn sequential code
into parallel code by piping the sequential code to the futurize()
function. Easy!
library(futurize)
plan(multisession)
library(caret)
ctrl <- trainControl(method = "cv", number = 10)
model <- train(Species ~ ., data = iris, method = "rf", trControl = ctrl) |> futurize()
This vignette demonstrates how to use this approach to parallelize caret
functions such as train().
The caret package provides a rich set of machine-learning tools
with a unified API. The train() function fits models using
cross-validation or bootstrap resampling, making it an excellent
candidate for parallelization.
The train() function fits models across multiple resampling
iterations:
library(caret)
## Set up 10-fold cross-validation
ctrl <- trainControl(method = "cv", number = 10)
## Train a random forest model
model <- train(Species ~ ., data = iris, method = "rf", trControl = ctrl)
Here train() evaluates sequentially, but we can easily make it
evaluate in parallel by piping to futurize():
library(futurize)
library(caret)
ctrl <- trainControl(method = "cv", number = 10)
model <- train(Species ~ ., data = iris, method = "rf", trControl = ctrl) |> futurize()
This will distribute the cross-validation folds across the available parallel workers, given that we have set up parallel workers, e.g.
plan(multisession)
The built-in multisession backend parallelizes on your local
computer and works on all operating systems. There are [other
parallel backends] to choose from, including alternatives to
parallelize locally as well as distributed across remote machines,
e.g.
plan(future.mirai::mirai_multisession)
and
plan(future.batchtools::batchtools_slurm)
The following caret functions are supported by futurize():
bag()gafs()nearZeroVar()rfe()safs()sbf()train()