---
title: "validation_tests"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{validation-tests}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width = 8,
  fig.height = 8,
  message = FALSE,
  warning = FALSE
)
library(ggplot2); library(dplyr); library(tidyr)
data.table::setDTthreads(1L)
options(dplyr.summarise.inform = FALSE, scipen = 999, digits = 5)
theme_set(theme_bw(base_size = 12))
```

> **Notation.** For symbol definitions, see the [notation vignette](notation.html).

## Overview

The aim of this vignette is show how to perform correct validation tests using closest-not-yet-treated control groups using `childpen`.

## Simulate data

The package has a built in simulation function to draw data resembling child penalty studies. 

```{r draw_data}
library(childpen)

data <- simulate_data(n_individuals = 5000, treatment_groups = 24:28)
data |> tibble()
```

## The correct validation tests

See the [estimation vignette](estimation.html) for an explainer on the $2\times2$ comparisons in `childpen`. Recall that $d$ is the treatment group, $a$ is the target age, and $d^\prime=a+1$ is the closest not-yet-treated control group. Recall that the control group is $d' = a + 1$, the cohort whose first birth is one year after the target age. 

Assume that when presenting results, post-treatment, you report estimates for event times $e=0,...,2$. Then, for each treatment group $d$ you use 3 different control groups in post-treatment estimation. As the identification assumptions (e.g., parallel trends for DID) must hold for each point-estimate separately, this implies that it must hold within each treatment-control pair.

The above argument means that the validation tests should be done separately by treatment-control combinations. Returning to the above example, if you want to show results for $e=0,...,2$ then you need to conduct pre-trend analysis for 3 different control groups. This is done automatically in the `childpen` package, as we show below.

For completeness, the validation tests are:

1. Difference-in-differences (DID) estimates the average treatment effect (ATE) in pre-periods
2. Triple differences (TD) estimates the gender gap in the ATE in pre-periods
3. Normalized triple differences (NTD) estimates the gender gap in normalized effects in pre-periods

## Multiple treatment group analysis

We will now do the main heavy lifting. We run the main estimation function, `multiple_treatment_group_analysis()`. Set `periods_pre` to the number of pre-treatment periods for which you want to conduct validation tests. As an example, we will examine two periods pre-treatment. Since we set the number of periods in the post-treatment to 2 using `periods_post`, this will report validation tests separately for 3 control groups, as discussed above.

```{r estimation}
res = multiple_treatment_group_analysis(data = data,
                                  treatment_groups = 24:25, # which treatment groups to run in the analysis
                                  periods_post = 2, # estimate results for post periods 0:2
                                  periods_pre = 2, # estimate pre-trend diagnostics, set to NULL to omit from estimation
                                  max_age = 40, # dont estimate results if age is above 40
                                  min_age = 20, # dont estimate results if age is below 20
                                  pre = 1, # use 1 period before treatment, can make further away if anticipation is concern
                                  verbose = FALSE # set to TRUE to output progress (i like to time loops) set to FALSE to omit messages
                                  )
```

## Examining results of validation tests

As a first pass, lets see the results.

```{r results_first}
res |> tibble()
```

Focusing on $d=24$, lets examine pre-trends. We will start with DID of females. Generally, valid pre-trend validation tests would behave such that the confidence intervals include 0, and there is no obvious trend in the pre-period, and there is no systematic difference between control groups. A valid pre-trend test shows point estimates near zero with confidence intervals covering zero and no systematic trend across pre-treatment event times.

Note that in the plot below I define `control_offset` as the difference between the control group $d^\prime$ and the treatment group $d$. E.g., for $d=24$ and $d^\prime=25$, i.e., the closest not-yet-treated control group at event time $e=0$, I set control offset to 1.

Ribbons present 95\% CI based on standard errors clustered at the individual level.

```{r}
res |>
  filter(d == 24,
         a < d,
         estimand == "ATE",
         method == "DID_Female") |>
  mutate(control_offset = dp - d,
         control_offset = factor(control_offset)) |>
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) +
  geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() +
  scale_x_continuous(breaks = -3:-2) +
  facet_grid(cols = vars(control_offset))
```

Although this would be hard to look at, we can put all control offsets on same plot.


```{r}
res |>
  filter(d == 24,
         a < d,
         estimand == "ATE",
         method == "DID_Female") |>
  mutate(control_offset = dp - d, 
         control_offset = factor(control_offset)) |> 
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) +
  geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + 
  scale_x_continuous(breaks = -3:-2)
```


Can do this for multiple treatment groups at same time 

```{r}
res |> 
  filter(a < d, 
         estimand == "ATE",
         method == "DID_Female") |> 
  mutate(control_offset = dp - d, 
         control_offset = factor(control_offset)) |> 
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) +
  geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + 
  scale_x_continuous(breaks = -3:-2) + 
  facet_grid(cols = vars(d))
```

Finally, can do this for all methods.

```{r}
res |> 
  filter(a < d, 
         estimand == "ATE" & (method == "DID_Female" | method == "DID_Male" | method == "TD") |
           estimand == "theta" & method == "NTD_Conv") |>
  mutate(control_offset = dp - d, 
         control_offset = factor(control_offset)) |> 
  ggplot(aes(x = event_time, y = est, ymin = ci_l, ymax = ci_h, color = control_offset, fill = control_offset)) +
  geom_ribbon(alpha = .15, color = NA) + geom_point() + geom_line() + 
  scale_x_continuous(breaks = -3:-2) + 
  facet_grid(cols = vars(d), rows = vars(method), scales = "free")
```