LeaveOutKSS Overview
Overview
LeaveOutKSS is an ‘R’ translation of the leave-out
variance-component workflow for two-way fixed effects models associated
with Kline, Saggio, and Solvsten (2020). The package follows the same
broad logic described in the repository README and in the original
‘MATLAB’ vignette:
- start from worker identifiers, firm identifiers, and an
outcome;
- restrict the sample to a connected mobility graph;
- prune further to a leave-one-worker-out connected set;
- optionally partial out controls;
- compute leverage-based bias adjustments exactly or by
Johnson-Lindenstrauss approximation (JLA);
- report plug-in and bias-corrected variance components.
The examples in this package currently rely on the small bundled
panel used by the repository’s 01_basic_no_controls.R
example.
Abowd, Kramarz, and Margolis (1999; AKM) Setup
The target application is the familiar Abowd, Kramarz, and Margolis
(1999; AKM)-style model
\[
y_{gt} = \alpha_g + \psi_{j(g,t)} + w'_{gt}\delta +
\varepsilon_{gt},
\]
where id indexes workers, firmid indexes
firms, and controls can be used for observed covariates
such as year effects. The main quantities of interest are the variance
of firm effects, the covariance of worker and firm effects, and the
variance of worker effects.
The Kline, Saggio, and Solvsten (KSS) correction matters because
plug-in variance decompositions treat estimated fixed effects as if they
were measured without error. The leave-out approach instead uses
observation-specific leverage adjustments to remove the leading bias
from these variance-component estimates.
Bundled Example Data
path <- system.file("extdata", "test.csv", package = "LeaveOutKSS")
dt <- data.table::fread(path, header = FALSE)
data.table::setorder(dt, V1, V3)
dim(dt)
#> [1] 71614 10
head(dt)
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#> <int> <int> <int> <num> <num> <num> <int> <int> <int> <int>
#> 1: 13736 9475 1999 4.913220 0.050625 0.01139062 1 0 1 0
#> 2: 13736 9475 2001 4.905479 0.075625 0.02079688 0 1 0 1
#> 3: 27351 10973 1999 4.972081 0.062500 0.01562500 1 0 1 0
#> 4: 27351 10973 2001 4.952005 0.090000 0.02700000 0 1 0 1
#> 5: 55440 9475 1999 4.984412 0.010000 0.00100000 1 0 1 0
#> 6: 55440 9475 2001 4.985057 0.022500 0.00337500 0 1 0 1
The bundled file follows the layout used in the repository
examples:
- column 1: worker identifier
- column 2: firm identifier
- column 3: year
- column 4: outcome
Before calling leave_out_KSS() or
leave_out_KSS_fe(), sort the panel by worker identifier
and, within worker, from earlier to later time periods.
Main Workflow
The basic decomposition is performed by
leave_out_KSS().
res <- leave_out_KSS(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
leave_out_level = "matches",
type_algorithm = "JLA",
simulations_JLA = 5,
paral = FALSE,
progress = FALSE
)
print(res)
res$estimates$table
The routine returns an object whose main elements are:
res$estimates$table: biased and bias-corrected
decomposition estimates
res$effects: estimated worker and firm effects in the
original identifier space
If you want files, you can export them explicitly:
stem <- tempfile("leaveoutkss_")
leave_out_KSS(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
simulations_JLA = 5,
paral = FALSE,
csv_file = paste0(stem, ".csv"),
txt_file = paste0(stem, ".txt"),
progress = FALSE
)
unlink(paste0(stem, c(".csv", ".txt")))
Controls
The original vignette emphasizes that controls are handled by
partialling them out in the leave-out connected set and then running the
decomposition on the residualized outcome. In R, one way to do this is
to pass a control matrix directly.
controls <- model.matrix(~ factor(dt[[3]]) - 1)
controls <- controls[, -ncol(controls), drop = FALSE]
leave_out_KSS(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
controls = controls,
simulations_JLA = 5,
paral = FALSE,
progress = FALSE
)
If a control is more naturally supplied as a coded categorical
variable, leave_out_KSS_fe() can expand selected columns
internally:
leave_out_KSS_fe(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
controls = cbind(year = dt[[3]]),
absorb_col = 1,
simulations_JLA = 5,
paral = FALSE,
progress = FALSE
)
Leaving Out Matches or Observations
The default leave_out_level = "matches" follows the
discussion in the original vignette: it is intended to be robust to
unrestricted heteroskedasticity and serial correlation within
worker-firm matches. Setting leave_out_level = "obs"
switches the correction to leaving out single person-year observations
instead.
leave_out_KSS(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
leave_out_level = "obs",
simulations_JLA = 5,
paral = FALSE,
progress = FALSE
)
Regressing Firm Effects on Observables
The ‘MATLAB’ vignette also discusses linear projections of estimated
firm effects on observables. In this package, that workflow is exposed
through the lincom_do, Z_lincom, and
labels_lincom arguments of leave_out_KSS(),
which call lincom_KSS() internally.
region_dummy <- as.numeric(dt[[3]] <= median(dt[[3]], na.rm = TRUE))
leave_out_KSS(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
simulations_JLA = 5,
paral = FALSE,
lincom_do = 1,
Z_lincom = region_dummy,
labels_lincom = list("Early-Year Indicator"),
progress = FALSE
)
R-Squared Companion
rsquared_comp() compares the fit of the standard two-way
fixed effects design with a saturated worker-firm interaction model.
rsquared_comp(
y = dt[[4]],
id = dt[[1]],
firmid = dt[[2]],
progress = FALSE
)
Notes on Current Scope
At this stage, package documentation and examples intentionally rely
on the small bundled dataset rather than the large-data workflow from
the repository’s 04_large_no_controls.R. The computational
shortcuts for large datasets are still reflected in the application
programming interface (API), especially the Johnson-Lindenstrauss
approximation (JLA)-based leverage routines, but the documentation
examples focus on the small reproducible panel.
References
Abowd, J. M., Kramarz, F., and Margolis, D. N. (1999). High wage
workers and high wage firms. Econometrica, 67(2), 251-333.
Kline, P., Saggio, R., and Solvsten, M. (2020). Leave-out estimation
of variance components. Econometrica, 88(5), 1859-1898.