<!-- README.md is generated from README.Rmd. Please edit that file -->

# ubair <img src="inst/sticker/stickers-ubair-1.png" align="right" width="20%"/>

**ubair** is an R package for Statistical Investigation of the Impact of External Conditions on Air Quality: it uses the statistical software R to analyze and visualize the impact of external factors, such as traffic restrictions, hazards, and political measures, on air quality. It aims to provide experts with a transparent comparison of modeling approaches and to support data-driven evaluations for policy advisory purposes.

## Installation

-   Download zip from GitLab
-   Unzip in SINA
-   double-klick R-Projekt “ubair” (open with R-Studio)
-   Type in console:

``` r
install.packages("remotes")
remotes::install_local()
```

#### Using remote package

Git needs to be installed.

``` r
install.packages("remotes")
remotes::install_git("git@gitlab.opencode.de:uba-ki-lab/ubair.git")
# alternative via https
remotes::install_git("https://gitlab.opencode.de/uba-ki-lab/ubair.git")
```

## Sample Usage of package

For a more detailed explanation of the package, you can access the vignettes:

-   View user_sample source code directly in the [vignettes/](vignettes/) folder.
-   Open vignette by function `vignette("user_sample_1", package = "ubair")`, if the package was installed with vignettes

``` r
library(ubair)
params <- load_params()
env_data <- sample_data_DESN025
```

``` r
# Plot meteo data
plot_station_measurements(env_data, params$meteo_variables)
```

<img src="man/figures/README-plot-meteo-data-1.png" width="100%"/>

-   split data into training, reference and effect time intervals <img src="man/figures/time_split_overview.png" width="100%"/>

``` r
application_start <- lubridate::ymd("20191201") # This coincides with the start of the reference window
date_effect_start <- lubridate::ymd_hm("20200323 00:00") # This splits the forecast into reference and effect
application_end <- lubridate::ymd("20200504") # This coincides with the end of the effect window

buffer <- 24 * 14 # 14 days buffer

dt_prepared <- prepare_data_for_modelling(env_data, params)
dt_prepared <- dt_prepared[complete.cases(dt_prepared)]
split_data <- split_data_counterfactual(
  dt_prepared, application_start,
  application_end
)
res <- run_counterfactual(split_data,
  params,
  detrending_function = "linear",
  model_type = "lightgbm",
  alpha = 0.9,
  log_transform = TRUE,
  calc_shaps = TRUE
)
```

```         
#> [LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.023641 seconds.
#> You can set `force_col_wise=true` to remove the overhead.
#> [LightGBM] [Info] Total Bins 1557
#> [LightGBM] [Info] Number of data points in the train set: 104486, number of used features: 9
#> [LightGBM] [Info] Start training from score -0.000000
```

``` r
predictions <- res$prediction

plot_counterfactual(predictions, params,
  window_size = 14,
  date_effect_start,
  buffer = buffer,
  plot_pred_interval = TRUE
)
```

<img src="man/figures/README-counterfactual-scenario-1.png" width="100%"/>

``` r
round(calc_performance_metrics(predictions, date_effect_start, buffer = buffer), 2)
```

```         
#>           RMSE            MSE            MAE           MAPE           Bias 
#>           7.38          54.48           5.38           0.18          -2.73 
#>             R2 Coverage lower Coverage upper       Coverage    Correlation 
#>           0.74           0.97           0.95           0.92           0.89 
#>            MFB            FGE 
#>          -0.05           0.19
```

``` r
round(calc_summary_statistics(predictions, date_effect_start, buffer = buffer), 2)
```

::: kable-table
|                      |   true | prediction |
|:---------------------|-------:|-----------:|
| min                  |   3.36 |       5.58 |
| max                  | 111.90 |      59.71 |
| var                  | 212.96 |     128.16 |
| mean                 |  30.80 |      28.07 |
| 5-percentile         |   9.29 |      10.73 |
| 25-percentile        |  19.85 |      19.40 |
| median/50-percentile |  29.60 |      27.09 |
| 75-percentile        |  40.54 |      36.27 |
| 95-percentile        |  56.80 |      47.69 |
:::

``` r
estimate_effect_size(predictions, date_effect_start, buffer = buffer, verbose = TRUE)
```

```         
#> The external effect changed the target value on average by -6.294 compared to the reference time window. This is a -26.37% relative change.

#> $absolute_effect
#> [1] -6.294028
#> 
#> $relative_effect
#> [1] -0.2637
```

### SHAP feature importances

``` r
shapviz::sv_importance(res$importance, kind = "bee")
```

<img src="man/figures/README-feature_importance-1.png" width="100%"/>

``` r
xvars <- c("TMP", "WIG", "GLO", "WIR")
shapviz::sv_dependence(res$importance, v = xvars)
```

<img src="man/figures/README-feature_importance-2.png" width="100%"/>

## Development

### Prerequisites

1.  **R**: Make sure you have R installed (recommended version 4.4.1). You can download it from [CRAN](https://cran.r-project.org/).
2.  **RStudio** (optional but recommended): Download from [RStudio](https://posit.co/).

### Setting Up the Environment

Install the development version of ubair:

``` r
install.packages("renv")
renv::restore()
devtools::build()
devtools::load_all()
```

### Development

#### Install pre-commit hook (required to ensure tidyverse code formatting)

```         
pip install pre-commit
```

#### Add new requirements

If you add new dependencies to *ubair* package, make sure to update the renv.lock file:

``` r
renv::snapshot()
```

#### style and documentation

Before you commit your changes update documentation, ensure style complies with tidyverse styleguide and all tests run without error

``` r
# update documentation and check package integrity
devtools::check()
# apply tidyverse style (also applied as precommit hook)
usethis::use_tidy_style()
# you can check for existing lintr warnings by
devtools::lint()
# run tests
devtools::test()
# build README.md if any changes have been made to README.Rmd
devtools::build_readme()
```

#### Pre-commit hook

in .pre-commit-hook.yaml pre-commit rules are defined and applied before each commmit. This includes: split - run styler to format code in tidyverse style - run roxygen to update doc - check if readme is up to date - run lintr to finally check code style format

If precommit fails, check the automatically applied changes, stage them and retry to commit.

#### Test Coverage

Install covr to run this.

``` r
cov <- covr::package_coverage(type = "all")
cov_list <- covr::coverage_to_list(cov)
data.table::data.table(
  part = c("Total", names(cov_list$filecoverage)),
  coverage = c(cov_list$totalcoverage, as.vector(cov_list$filecoverage))
)
```

``` r
covr::report(cov)
```

## Contacts

**Jore Noa Averbeck** [JoreNoa.Averbeck\@uba.de](mailto:JoreNoa.Averbeck@uba.de){.email}

**Raphael Franke** [Raphael.Franke\@uba.de](mailto:Raphael.Franke@uba.de){.email}

**Imke Voß** [imke.voss\@uba.de](mailto:imke.voss@uba.de){.email}
