% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vigitel.R
\name{vigitel_data}
\alias{vigitel_data}
\title{Download VIGITEL microdata}
\usage{
vigitel_data(
  year = NULL,
  format = c("dta", "csv"),
  vars = NULL,
  cache_dir = NULL,
  force = FALSE,
  lazy = FALSE,
  backend = c("arrow", "duckdb")
)
}
\arguments{
\item{year}{Integer or vector of integers. Years to return (2006-2024).
Use NULL to return all years. Default is NULL.}

\item{format}{Character. File format to download: "dta" (Stata, default)
or "csv". Stata format preserves variable labels.}

\item{vars}{Character vector. Variables to select. Use NULL for all variables.
Default is NULL.}

\item{cache_dir}{Character. Directory for caching downloaded files.
Default uses \code{tools::R_user_dir("healthbR", "cache")}.}

\item{force}{Logical. If TRUE, re-download even if file exists in cache.
Default is FALSE.}

\item{lazy}{Logical. If TRUE, returns a lazy query object instead of a
tibble. Requires the \pkg{arrow} package. The lazy object supports
dplyr verbs (filter, select, mutate, etc.) which are pushed down
to the query engine before collecting into memory. Call
\code{dplyr::collect()} to materialize the result. Default: FALSE.}

\item{backend}{Character. Backend for lazy evaluation: \code{"arrow"}
(default) or \code{"duckdb"}. Only used when \code{lazy = TRUE}.
DuckDB backend requires the \pkg{duckdb} package.}
}
\value{
A tibble with VIGITEL microdata.
}
\description{
Downloads and returns VIGITEL survey microdata from the Ministry of Health.
Data is cached locally to avoid repeated downloads. When the \code{arrow} package
is installed, data is cached in partitioned parquet format for faster
subsequent reads.
}
\details{
The VIGITEL survey (Vigilância de Fatores de Risco e Proteção para Doenças
Crônicas por Inquérito Telefônico) is conducted annually by the Brazilian
Ministry of Health in all state capitals and the Federal District.

Data includes information on:
\itemize{
\item Demographics (age, sex, education, race)
\item Health behaviors (smoking, alcohol, diet, physical activity)
\item Health conditions (hypertension, diabetes, obesity)
\item Healthcare utilization
}

The survey uses post-stratification weights (variable \code{pesorake}) to produce
population estimates. Always use these weights for statistical inference.
\subsection{Performance}{

When the \code{arrow} package is installed, data is cached in partitioned parquet
format. This allows the function to read only the requested years without
loading the entire dataset into memory. If you frequently work with VIGITEL
data, installing \code{arrow} is highly recommended:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{install.packages("arrow")
}\if{html}{\out{</div>}}
}
}
\section{Data source}{

Data is downloaded from the Ministry of Health website:
\verb{https://svs.aids.gov.br/daent/cgdnt/vigitel/}
}

\examples{
\dontshow{if (interactive()) withAutoprint(\{ # examplesIf}
# download all years (uses tempdir to avoid leaving files)
df <- vigitel_data(cache_dir = tempdir())

# download specific year
df_2024 <- vigitel_data(year = 2024, cache_dir = tempdir())

# download multiple years
df_recent <- vigitel_data(year = 2020:2024, cache_dir = tempdir())

# select specific variables
df_subset <- vigitel_data(
  year = 2024,
  vars = c("ano", "cidade", "sexo", "idade", "pesorake"),
  cache_dir = tempdir()
)
\dontshow{\}) # examplesIf}
}
