---
title: "Quick start guide to pixieweb"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Quick start guide to pixieweb}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

pixieweb makes it easy to download open statistical data from PX-Web APIs
— the platform used by Statistics Sweden (SCB), Statistics Norway (SSB),
Statistics Finland, and many others. This vignette walks you from zero
to a tidy tibble in five steps.

## Step 1: Connect to an API

```{r}
library(pixieweb)

scb <- px_api("scb", lang = "en")
scb
```

`px_api()` accepts a short alias (`"scb"`, `"ssb"`, `"statfi"`) or a
full URL. Use `px_api_catalogue()` to list known instances.

## Step 2: Find a table

PX-Web organises data into **tables**. Each table holds a data cube
with one or more dimensions (called **variables**). Use `get_tables()`
to search:

```{r}
tables <- get_tables(scb, query = "population")
tables
```

The result is a tibble. You can narrow it further on the client side
with `table_search()`, and inspect tables with `table_describe()`:

```{r}
tables |>
  table_search("municipal") |>
  table_describe(max_n = 3, format = "md")
```

`table_describe()` now shows the subject path, time period range, and
data source alongside the title — making it much easier to pick the
right table.

## Step 3: Explore variables

Once you have a table ID, inspect what variables (dimensions) it has:

```{r}
vars <- get_variables(scb, "TAB683")
vars |> variable_describe()
```

Each variable has a set of available **values** (codes). Look at a
specific variable's values:

```{r}
vars |> variable_values("Region")
```

## Step 4: Fetch data

Now you know which variables the table has and what values are
available. Pass your selections to `get_data()`:

- **ContentsCode** tells the API *what* to measure (population, deaths,
  etc.). `"*"` means "all measures in this table".
- Variables you **omit** are *eliminated* — the API returns a
  pre-computed aggregate (e.g., omitting `Kon` gives totals for both
  sexes). Not all variables allow this; see
  `vignette("introduction-to-pixieweb")` for mandatory vs eliminable.

```{r}
pop <- get_data(scb, "TAB638",
  Region = c("0180", "1480"),
  ContentsCode = "*",
  Tid = px_top(5)
)
pop
```

Selection helpers like `px_top()`, `px_from()`, and `px_range()` let
you select values without knowing exact codes. Use them when you want
"the latest N periods" or "everything from 2020 onward" rather than
typing out specific year codes.

### Optional shortcut: `prepare_query()`

You can skip this section if you prefer the direct approach above.
`prepare_query()` inspects the table and fills in sensible defaults —
handy when you don't want to specify every variable:

```{r}
q <- prepare_query(scb, "TAB638", Region = c("0180", "1480"))
```

It prints a summary of what was chosen and why. When you're happy,
pass the query to `get_data()`:

```{r}
pop <- get_data(scb, query = q)
```

Set `maximize_selection = TRUE` to automatically include as many
variables as the API's cell limit allows:

```{r}
q <- prepare_query(scb, "TAB638",
  Region = c("0180"),
  maximize_selection = TRUE
)
```

## Step 5: Work with the result

The result is a standard tibble. Use your favourite tidyverse tools:

```{r}
library(ggplot2)

pop |>
  ggplot(aes(x = Tid, y = value, colour = Region_text)) +
  # One line per region
  geom_line(aes(group = Region_text)) +
  # Separate panel for each measure (Population, Deaths, etc.)
  facet_wrap(~ ContentsCode_text, scales = "free_y") +
  # Rotate x-axis labels to avoid overlap
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
  labs(
    title = "Population over time",
    caption = px_cite(pop)  # Auto-generated data citation
  )
```

Notice the `_text` suffix: `get_data()` returns both raw code columns
(`Region = "0180"`) and human-readable label columns
(`Region_text = "Stockholm"`). Use `_text` columns for display and
plotting; use the raw codes for filtering and joining.

Other useful helpers:

- `data_minimize()` — remove columns where all values are identical
- `data_legend()` — generate a caption string from variable metadata
- `px_cite()` — create a citation for the downloaded data

## Next steps

- **Concepts & advanced features** — `vignette("introduction-to-pixieweb")`
  covers the data model, codelists, saved queries, and query
  composition.
- **Multiple countries** — `vignette("multi-api")` shows how to compare
  data across national statistics agencies.
- **ggplot2 reference** — <https://ggplot2-book.org/> for more on
  visualisation.
