Quick Start with bifrost

Overview

bifrost performs branch-level inference of multi-regime, multivariate trait evolution on a phylogeny using penalized-likelihood multivariate GLS fits. The current implementation searches for evolutionary model shifts under a multi-rate Brownian Motion (BMM) model with proportional regime variance-covariance (VCV) scaling. It operates directly in trait space, works with fossil tip-dated trees, and is designed for high-dimensional datasets (p > n) and large trees (> 1000 tips).

If you have not installed the package yet, see the installation instructions on the package README.

In practical terms, bifrost asks:

Minimal worked example

The smallest end-to-end workflow is a simulated single-regime example. Running it is mainly a quick way to confirm that bifrost and its dependencies are installed correctly and that the core search routine runs on your machine. In this case, we typically expect bifrost to recover no shifts.

library(bifrost)
library(ape)
library(phytools)
library(mvMORPH)

set.seed(1)

# Simulate a tree
tr <- pbtree(n = 50, scale = 1)

# Simulate multivariate traits under a single-regime BM1 model (no shifts)
sigma <- diag(0.1, 2)  # 2 x 2 variance-covariance matrix for two traits
theta <- c(0, 0)       # ancestral means for the two traits

sim <- mvSIM(
  tree  = tr,
  nsim  = 1,
  model = "BM1",
  param = list(
    ntraits = 2,
    sigma   = sigma,
    theta   = theta
  )
)

# mvSIM returns either a matrix or a list of matrices depending on mvMORPH version
X <- if (is.list(sim)) sim[[1]] else sim
rownames(X) <- tr$tip.label

# Run bifrost's greedy search for shifts
res <- searchOptimalConfiguration(
  baseline_tree              = tr,
  trait_data                 = X,
  formula                    = "trait_data ~ 1",
  min_descendant_tips        = 10,
  num_cores                  = 1,
  shift_acceptance_threshold = 20,  # conservative GIC threshold
  IC                         = "GIC",
  plot                       = FALSE,
  store_model_fit_history    = FALSE,
  verbose                    = FALSE
)

# For this single-regime BM1 simulation, we typically expect no inferred shifts
res$shift_nodes_no_uncertainty
res$optimal_ic - res$baseline_ic
str(res$VCVs)

What to expect from this example:

Before you run on real data

At minimum, provide:

  1. A rooted phylogeny with branch lengths interpreted in units of time.
  2. A multivariate trait matrix or data frame with taxa as row names.

Important checks before fitting:

The arguments you will tune most often are:

Practical starting points:

The ic_uncertainty_threshold argument is currently reserved for future pruning and uncertainty workflows and can typically be left at its default.

How the search works

searchOptimalConfiguration() performs a greedy, step-wise search:

  1. Fit a baseline single-regime model.
  2. Generate one-shift candidate trees for internal nodes satisfying min_descendant_tips.
  3. Score candidate models, optionally in parallel.
  4. Iteratively accept shifts that improve the chosen information criterion by at least shift_acceptance_threshold.
  5. Optionally compute per-shift IC support by removing each accepted shift in turn and refitting.

The accepted shift set defines the optimal no-uncertainty configuration under the selected criterion ("GIC" or "BIC").

High-level workflow

The figure below summarizes the main stages of the greedy search and post-processing workflow implemented in bifrost.

Workflow diagram showing setup, baseline scoring, greedy search, post-processing, and output steps in bifrost.

Because the search is greedy, it can converge to a local optimum. In practice, it is useful to repeat analyses with different min_descendant_tips, thresholds, and IC choices to assess robustness.

Core functions

Internally, the search also relies on unexported helpers such as generatePaintedTrees(), fitMvglsAndExtractGIC(), fitMvglsAndExtractBIC(), calculateAllDeltaGIC(), paintSubTree_mod(), addShiftToModel(), removeShiftFromTree(), paintSubTree_removeShift(), whichShifts(), and extractRegimeVCVs(). Most users will not need to call these directly.

Reading the result

searchOptimalConfiguration() returns a named list that typically includes:

The two returned trees distinguish transformed versus original branch lengths, and ic_weights stores per-shift support summaries such as IC weights and evidence ratios when those calculations are requested.

Interpretation guidance:

Practical guidance

When moving from this smoke test to a real dataset:

Once this smoke test runs cleanly, the next step is usually to move to a real dataset and inspect the returned shifts, IC support values, and regime-specific VCVs in more detail. For a full empirical workflow, continue to the jaw-shape vignette; for broader conceptual background on the package website, see Brownian Motion and Multivariate Shifts and Whole-Tree Models, PCA, and bifrost.