LeaveOutKSS Overview

Overview

LeaveOutKSS is an ‘R’ translation of the leave-out variance-component workflow for two-way fixed effects models associated with Kline, Saggio, and Solvsten (2020). The package follows the same broad logic described in the repository README and in the original ‘MATLAB’ vignette:

  1. start from worker identifiers, firm identifiers, and an outcome;
  2. restrict the sample to a connected mobility graph;
  3. prune further to a leave-one-worker-out connected set;
  4. optionally partial out controls;
  5. compute leverage-based bias adjustments exactly or by Johnson-Lindenstrauss approximation (JLA);
  6. report plug-in and bias-corrected variance components.

The examples in this package currently rely on the small bundled panel used by the repository’s 01_basic_no_controls.R example.

Abowd, Kramarz, and Margolis (1999; AKM) Setup

The target application is the familiar Abowd, Kramarz, and Margolis (1999; AKM)-style model

\[ y_{gt} = \alpha_g + \psi_{j(g,t)} + w'_{gt}\delta + \varepsilon_{gt}, \]

where id indexes workers, firmid indexes firms, and controls can be used for observed covariates such as year effects. The main quantities of interest are the variance of firm effects, the covariance of worker and firm effects, and the variance of worker effects.

The Kline, Saggio, and Solvsten (KSS) correction matters because plug-in variance decompositions treat estimated fixed effects as if they were measured without error. The leave-out approach instead uses observation-specific leverage adjustments to remove the leading bias from these variance-component estimates.

Bundled Example Data

path <- system.file("extdata", "test.csv", package = "LeaveOutKSS")
dt <- data.table::fread(path, header = FALSE)
data.table::setorder(dt, V1, V3)
dim(dt)
#> [1] 71614    10
head(dt)
#>       V1    V2    V3       V4       V5         V6    V7    V8    V9   V10
#>    <int> <int> <int>    <num>    <num>      <num> <int> <int> <int> <int>
#> 1: 13736  9475  1999 4.913220 0.050625 0.01139062     1     0     1     0
#> 2: 13736  9475  2001 4.905479 0.075625 0.02079688     0     1     0     1
#> 3: 27351 10973  1999 4.972081 0.062500 0.01562500     1     0     1     0
#> 4: 27351 10973  2001 4.952005 0.090000 0.02700000     0     1     0     1
#> 5: 55440  9475  1999 4.984412 0.010000 0.00100000     1     0     1     0
#> 6: 55440  9475  2001 4.985057 0.022500 0.00337500     0     1     0     1

The bundled file follows the layout used in the repository examples:

Before calling leave_out_KSS() or leave_out_KSS_fe(), sort the panel by worker identifier and, within worker, from earlier to later time periods.

Main Workflow

The basic decomposition is performed by leave_out_KSS().

res <- leave_out_KSS(
  y = dt[[4]],
  id = dt[[1]],
  firmid = dt[[2]],
  leave_out_level = "matches",
  type_algorithm = "JLA",
  simulations_JLA = 5,
  paral = FALSE,
  progress = FALSE
)

print(res)
res$estimates$table

The routine returns an object whose main elements are:

If you want files, you can export them explicitly:

stem <- tempfile("leaveoutkss_")

leave_out_KSS(
  y = dt[[4]],
  id = dt[[1]],
  firmid = dt[[2]],
  simulations_JLA = 5,
  paral = FALSE,
  csv_file = paste0(stem, ".csv"),
  txt_file = paste0(stem, ".txt"),
  progress = FALSE
)

unlink(paste0(stem, c(".csv", ".txt")))

Controls

The original vignette emphasizes that controls are handled by partialling them out in the leave-out connected set and then running the decomposition on the residualized outcome. In R, one way to do this is to pass a control matrix directly.

controls <- model.matrix(~ factor(dt[[3]]) - 1)
controls <- controls[, -ncol(controls), drop = FALSE]

leave_out_KSS(
  y = dt[[4]],
  id = dt[[1]],
  firmid = dt[[2]],
  controls = controls,
  simulations_JLA = 5,
  paral = FALSE,
  progress = FALSE
)

If a control is more naturally supplied as a coded categorical variable, leave_out_KSS_fe() can expand selected columns internally:

leave_out_KSS_fe(
  y = dt[[4]],
  id = dt[[1]],
  firmid = dt[[2]],
  controls = cbind(year = dt[[3]]),
  absorb_col = 1,
  simulations_JLA = 5,
  paral = FALSE,
  progress = FALSE
)

Leaving Out Matches or Observations

The default leave_out_level = "matches" follows the discussion in the original vignette: it is intended to be robust to unrestricted heteroskedasticity and serial correlation within worker-firm matches. Setting leave_out_level = "obs" switches the correction to leaving out single person-year observations instead.

leave_out_KSS(
  y = dt[[4]],
  id = dt[[1]],
  firmid = dt[[2]],
  leave_out_level = "obs",
  simulations_JLA = 5,
  paral = FALSE,
  progress = FALSE
)

Regressing Firm Effects on Observables

The ‘MATLAB’ vignette also discusses linear projections of estimated firm effects on observables. In this package, that workflow is exposed through the lincom_do, Z_lincom, and labels_lincom arguments of leave_out_KSS(), which call lincom_KSS() internally.

region_dummy <- as.numeric(dt[[3]] <= median(dt[[3]], na.rm = TRUE))

leave_out_KSS(
  y = dt[[4]],
  id = dt[[1]],
  firmid = dt[[2]],
  simulations_JLA = 5,
  paral = FALSE,
  lincom_do = 1,
  Z_lincom = region_dummy,
  labels_lincom = list("Early-Year Indicator"),
  progress = FALSE
)

R-Squared Companion

rsquared_comp() compares the fit of the standard two-way fixed effects design with a saturated worker-firm interaction model.

rsquared_comp(
  y = dt[[4]],
  id = dt[[1]],
  firmid = dt[[2]],
  progress = FALSE
)

Notes on Current Scope

At this stage, package documentation and examples intentionally rely on the small bundled dataset rather than the large-data workflow from the repository’s 04_large_no_controls.R. The computational shortcuts for large datasets are still reflected in the application programming interface (API), especially the Johnson-Lindenstrauss approximation (JLA)-based leverage routines, but the documentation examples focus on the small reproducible panel.

References

Abowd, J. M., Kramarz, F., and Margolis, D. N. (1999). High wage workers and high wage firms. Econometrica, 67(2), 251-333.

Kline, P., Saggio, R., and Solvsten, M. (2020). Leave-out estimation of variance components. Econometrica, 88(5), 1859-1898.