Help for package GWPR.light

Type:

Package

Title:

Geographically Weighted Panel Regression

Version:

1.0.0

Description:

A modern, first implementation of Geographically Weighted Panel Regression (GWPR) for spatial panel data. The package provides a unified public API supporting Gaussian and binomial family models, within/pooling/random panel effects, three bandwidth search strategies (grid, Stochastic Gradient Descent, random), five kernel functions, and optional parallel execution via the 'future' framework. Diagnostic tools include spatial Moran's I, local F-test, Hausman test, and Lagrange Multiplier test.

License:

AGPL (≥ 3)

Encoding:

UTF-8

LazyData:

true

Imports:

fixest, glmmTMB, lmtest, plm, sf, stats, utils

Depends:

R (≥ 3.5.0)

Suggests:

future, future.apply, rmarkdown, knitr, testthat (≥ 3.0.0)

VignetteBuilder:

knitr

URL:

https://github.com/MichaelChaoLi-cpu/GWPR.light

BugReports:

https://github.com/MichaelChaoLi-cpu/GWPR.light/issues

Config/testthat/edition:

Config/roxygen2/version:

8.0.0

NeedsCompilation:

Packaged:

2026-05-22 00:18:34 UTC; lichao

Author:

Chao Li

[aut, cre], Shunsuke Managi

[aut]

Maintainer:

Chao Li <chaoli0394@gmail.com>

Repository:

CRAN

Date/Publication:

2026-05-29 09:20:08 UTC

Create a placeholder NA history row for epochs after early stopping

Description

Create a placeholder NA history row for epochs after early stopping

Usage

.make_na_history_row(ep, bw, learning_rate, delta, batch_size)

Build a scorer function for bandwidth search

Description

Returns a closure that fits the full GWPR engine for a given bandwidth and returns the score (MSE for linear, log_loss for logistic), together with aggregate metrics.

Usage

.make_scorer(family = "gaussian", threshold = 0.5)

Arguments

family

Character; '"gaussian"' or '"binomial"'.

threshold

Numeric; classification threshold (binomial only).

Value

A function with signature 'scorer(context, bandwidth)'.

Convert a list of history row lists to a data.frame

Description

Convert a list of history row lists to a data.frame

Usage

.rows_to_df(rows)

California (sf)

Description

The counties' boundary in California

Usage

data(California)

Format

An sf object with 58 rows (one per county) and two columns:

GEOID: a numeric vector, fips IDs of the counties
geometry: sfc_MULTIPOLYGON, county boundary polygons (CRS: NAD83 longlat)

Author(s)

Chao Li <chaoli0394@gmail.com> Shunsuke Managi <managi.s@gmail.com>

Examples

data(California)
class(California)

Panel Dataset for Testing GWPR

Description

Panel dataset to estimate the relationship between county-level PM2.5 concentration and on-road transporation in California.

Usage

data(TransAirPolCalif)

Format

A data.frame with 23 variables, and 928 observations, which are:

GEOID: a numeric vector, fips IDs of the counties
year: a numeric vector, year
pm25: a numeric vector, annually average PM2.5 concentration in the counties
co2_mean: a numeric vector, geographically average CO2 emission from on-road transportation in each year, million tons/km2
Developed_Open_Space_perc: a numeric vector, percentage of developed open space of total area in each county
Developed_Low_Intensity_perc: a numeric vector, percentage of low-intensity developed area of total area in each county
Developed_Medium_Intensity_perc: a numeric vector, percentage of medium-intensity developed area of total area in each county
Developed_High_Intensity_perc: a numeric vector, percentage of high-intensity develope area of total area in each county
Open_Water_perc: a numeric vector, percentage of open water of total area in each county
Woody_Wetlands_perc: a numeric vector, percentage of woody wetland of total area in each county
Emergent_Herbaceous_Wetlands_perc: a numeric vector, percentage of emergent herbaceous wetland of total area in each county
Deciduous_Forest_perc: a numeric vector, percentage of deciduous forest of total area in each county
Evergreen_Forest_perc: a numeric vector, percentage of evergreen forest of total area in each county
Mixed_Forest_perc: a numeric vector, percentage of mixed forest of total area in each county
Shrub_perc: a numeric vector, percentage of shrub of total area in each county
Grassland_perc: a numeric vector, percentage of grassland of total area in each county
Pasture_perc: a numeric vector, percentage of pasture of total area in each county
Cultivated_Crops_perc: a numeric vector, percentage of cultivated crops of total area in each county
pop_density: a numeric vector, average population density in each county
summer_tmmx: a numeric vector, average temperature in summer
winter_tmmx: a numeric vector, average temperature in winter
summer_rmax: a numeric vector, average humidity in summer
winter_rmax: a numeric vector, average humidity in winter

Author(s)

Chao Li <chaoli0394@gmail.com> Shunsuke Managi <managi.s@gmail.com>

Examples


data(TransAirPolCalif)
head(TransAirPolCalif)

Reorder spatial rows to match panel ID order

Description

Given a named integer vector 'id_map' (names = unit IDs, values = row indices in 'spatial') produced by 'build_id_map()', this function reorders the rows of 'spatial' so that they align with the panel data ordering.

Usage

align_spatial_to_panel(spatial, id_map)

Arguments

spatial

An 'sf' object containing at minimum the rows referenced in 'id_map'.

id_map

A named integer vector as produced by 'build_id_map()'. Names are unit IDs (character); values are 1-based row indices into 'spatial'.

Value

An 'sf' object with rows reordered to match 'id_map'.

Public API for GWPR.light 1.0.0

Description

High-level user-facing functions for Geographically Weighted Panel Regression. These four functions form the complete public interface; all internal complexity is hidden behind them.

gwpr – full pipeline (bandwidth search + fitting + optional diagnostics).
select_bandwidth – standalone bandwidth search.
fit_gwpr – fit with a known bandwidth.
diagnose_gwpr – run diagnostics on a fitted model.

Assert that an object is an sf object

Description

Stops with an informative error when 'spatial' does not inherit from '"sf"'. sp objects are not supported.

Usage

assert_sf(spatial)

Arguments

spatial

Any R object.

Value

Invisibly returns 'TRUE' when the check passes.

Grid Search for Bandwidth Selection

Description

Functions implementing an exhaustive grid search over a user-specified range of bandwidth candidates. Each candidate is evaluated by a user- supplied scorer function; the full search history and the best bandwidth are returned as a 'gwpr_bandwidth' object.

Random Bandwidth Optimizer ('bandwidth_random.R')

Description

Implements a bounded random search for bandwidth selection. A user- specified number of candidate bandwidths are drawn uniformly at random from '[lower, upper]', scored by a user-supplied scorer function, and the candidate with the lowest score is returned as the best bandwidth.

The search boundaries ('lower', 'upper') **must** be set explicitly by the caller; automatic inference is intentionally not supported.

SGD Bandwidth Search ('bandwidth_sgd.R')

Description

Implements a stochastic gradient descent (SGD) based bandwidth search for GWPR models. A one-dimensional finite-difference gradient approximation is used to iteratively update the bandwidth over a fixed number of epochs. Mini-batch sampling and early stopping are supported.

The search does **not** require the user to specify 'lower', 'upper', or 'step'; SGD starts from a single initialised bandwidth and follows the (approximate) gradient. When 'lower' / 'upper' are supplied they are used as hard constraints.

Build a distance context object

Description

Wraps a distance matrix together with unit IDs into a list suitable for passing to 'get_local_distances()'. Optionally pre-computes the full distance matrix (recommended for small data) or stores only the coordinate matrix for on-the-fly computation (recommended for large data).

Usage

build_distance_context(coords, ids, longlat = FALSE, cache = TRUE)

Arguments

coords

A numeric matrix with columns 'X' and 'Y'.

ids

Character or numeric vector of unit IDs (length = nrow(coords)).

longlat

Logical. Passed to 'compute_distance()'.

cache

Logical. If 'TRUE' (default), pre-computes and caches the full n x n distance matrix. Set 'FALSE' for very large datasets to avoid memory pressure; 'get_local_distances()' will then compute rows on demand.

Value

A list with class '"gwpr_distance_context"' containing: * 'ids' — character vector of unit IDs. * 'distance_matrix' — n x n matrix (or 'NULL' if 'cache = FALSE'). * 'coords' — the original coordinate matrix. * 'longlat' — logical flag.

Build a mapping from panel unit IDs to spatial row indices

Description

Returns a named integer vector where each name is a panel unit ID (as a character string) and each value is the corresponding row index in 'spatial_data' (1-based).

Usage

build_id_map(panel_data, spatial_data, id)

Arguments

panel_data

A data frame with a column named 'id'.

spatial_data

An 'sf' data frame with a column named 'id'.

id

Character; name of the shared ID column.

Details

Rules: - Every panel ID must have a spatial match; missing IDs cause an error. - Extra spatial rows (not in panel) are silently ignored.

Value

Named integer vector mapping unit ID to spatial row index.

Build the model frame from the formula and panel data

Description

Build the model frame from the formula and panel data

Usage

build_model_frame(context)

Arguments

context

A 'gwpr_context' with 'formula' and 'panel_data' populated.

Value

Updated context with 'model_frame' populated.

Build the design matrix and response vector

Description

Extracts the response variable 'y' and the design matrix 'X' from the model frame. For 'binomial' family, the response is standardised to 0/1 via 'standardize_binary_response()'.

Usage

build_model_matrix(context)

Arguments

context

A 'gwpr_context' with 'model_frame', 'formula', and 'family' populated.

Value

Updated context with 'model_matrix' and 'response' populated.

Build a neighbour structure from an sf object

Description

Returns different structures depending on 'type':

Usage

build_neighbor_structure(spatial, type = c("distance", "contiguity"))

Arguments

spatial

An 'sf' object.

type

Character scalar: '"distance"' (default) or '"contiguity"'.

Details

* '"distance"' — a numeric coordinate matrix (columns 'X' and 'Y') suitable for pairwise distance computation. * '"contiguity"' — a named list where each element is the integer vector of neighbour row indices (1-based, Queen contiguity via ‘sf::st_relate()'). Used for spatial diagnostics such as Moran’s I.

Value

* For '"distance"': a numeric matrix with columns 'X' and 'Y'. * For '"contiguity"': a named list of integer vectors.

Classify memory risk level

Description

Maps an estimated byte count to a human-readable risk category.

Usage

classify_memory_risk(estimated_bytes)

Arguments

estimated_bytes

Non-negative numeric; total estimated bytes.

Details

Thresholds:

low: < 500 MB
medium: 500 MB – 2 GB
high: > 2 GB

Value

A character string: "low", "medium", or "high".

Compute a full pairwise distance matrix from a coordinate matrix

Description

Compute a full pairwise distance matrix from a coordinate matrix

Usage

compute_distance(coords, longlat = FALSE)

Arguments

coords

A numeric matrix with at least two columns ('X' and 'Y', or the first two columns if names are absent). Each row is one spatial unit.

longlat

Logical. If 'TRUE', great-circle distances (in kilometres) are calculated using the Haversine formula. If 'FALSE' (default), Euclidean distance is used.

Value

An n x n symmetric numeric matrix where element [i, j] is the distance between unit i and unit j. Diagonal is 0.

Compute geographically weighted kernel weights

Description

Given a numeric vector of distances from one focal unit to all others, returns a weight for each unit according to the chosen kernel and bandwidth.

Usage

compute_kernel_weights(distance, bandwidth, kernel, adaptive)

Arguments

distance

Numeric vector of distances from the focal unit to all spatial units (length n).

bandwidth

Numeric scalar. For fixed bandwidth: distance scale. For adaptive bandwidth: positive integer number of neighbours.

kernel

Character scalar, one of '"bisquare"', '"gaussian"', '"exponential"', '"tricube"', '"boxcar"'.

adaptive

Logical. 'TRUE' for adaptive (kNN) bandwidth.

Details

For **fixed** bandwidth ('adaptive = FALSE') the bandwidth parameter is a distance threshold or scale parameter used directly in the kernel formula.

For **adaptive** bandwidth ('adaptive = TRUE') the bandwidth parameter is the number of nearest neighbours k. The function first identifies the k-th smallest distance (among all units, including the focal unit itself at distance 0) and uses that distance as the effective bandwidth in the kernel formula.

Kernel formulae (d = distance, bw = effective bandwidth): * 'bisquare': '(1 - (d/bw)^2)^2' for 'd <= bw', else 0. * 'gaussian': 'exp(-0.5 * (d/bw)^2)'. * 'exponential': 'exp(-d/bw)'. * 'tricube': '(1 - (d/bw)^3)^3' for 'd <= bw', else 0. * 'boxcar': '1' for 'd <= bw', else 0.

Value

A numeric vector of length n with non-negative kernel weights.

Internal Context Object for GWPR.light 1.0.0

Description

These internal functions construct and validate the standardised 'gwpr_context' list that is passed between modules, eliminating repetitive argument passing.

Data Preparation Module for GWPR.light 1.0.0

Description

Internal functions that convert user inputs into the internal data structures required by the model engine: panel indices, spatial alignment, model frame, and model matrix.

Run diagnostic tests on a fitted GWPR model

Description

Top-level interface that dispatches to individual diagnostic sub-functions. Returns a 'gwpr_diagnostics' object containing all requested test results.

Usage

diagnose_gwpr(
  object,
  diagnostics = c("moran", "f_test", "hausman", "lm_test"),
  spatial_weights = NULL,
  panel_index = NULL,
  ...
)

Arguments

object

A 'gwpr_fit' object returned by 'fit_gwpr()' or similar.

diagnostics

Character vector naming the tests to run. Any subset of 'c("moran", "f_test", "hausman", "lm_test")'. Default is all four.

spatial_weights

Required when '"moran"' is in 'diagnostics'. A row-standardised n x n spatial weights matrix.

panel_index

Required when '"moran"' is in 'diagnostics'. A data.frame with columns 'id' and 'time' identifying each element of 'object$residuals'.

...

Additional arguments passed to individual diagnostic functions.

Details

Tests that are not applicable to the fitted model (e.g., Hausman test on a pooling model) return a list with 'status = "not_applicable"' and an explanatory 'message', rather than an error.

Value

A 'gwpr_diagnostics' object (list with class '"gwpr_diagnostics"') whose 'diagnostics' slot contains the result of each requested test.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(
  id   = rep(1:4, each = 5),
  time = rep(1:5, 4),
  y    = rnorm(20),
  x1   = rnorm(20)
)
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
diag_result <- diagnose_gwpr(fit, diagnostics = c("f_test", "hausman"))
print(diag_result)

Local Hausman test diagnostic on a gwpr_fit object

Description

Performs a local Hausman test (within vs. random) for each spatial unit using test statistics pre-computed during model fitting or stored in 'local_results'.

Usage

diagnose_hausman(object, ...)

Arguments

object

A 'gwpr_fit' object.

...

Currently ignored.

Details

**Applicable models**: gaussian with 'model = "random"'. For pooling models the Hausman test is not meaningful; the function returns a 'status = "not_applicable"' result. For logistic models the function also returns 'status = "not_applicable"'.

**Panel balance requirement**: No constraint at the unit level.

**Failure conditions**: Returns 'status = "missing_hausman_data"' for any unit where the required statistics are absent from 'local_results'.

**Logistic interpretation limit**: Not applicable.

Value

A named list with elements:

'local_hausman': Data frame with columns 'unit_id', 'statistic', 'p_value', 'df', 'status'.
'n_tested': Number of units tested.
'n_failed': Number of units where the test could not be computed.
'status': Overall status: '"ok"', '"not_applicable"', or '"no_local_results"'.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(
  id   = rep(1:4, each = 5),
  time = rep(1:5, 4),
  y    = rnorm(20),
  x1   = rnorm(20)
)
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
diagnose_hausman(fit)

Local Breusch-Pagan LM test diagnostic on a gwpr_fit object

Description

Performs a local Breusch-Pagan Lagrange Multiplier test for random effects for each spatial unit using test statistics stored in 'local_results'.

Usage

diagnose_lm(object, ...)

Arguments

object

A 'gwpr_fit' object.

...

Currently ignored.

Details

**Applicable models**: gaussian with 'model = "pooling"' or 'model = "random"'. For 'within' models the test is not directly applicable (it tests for random effects vs. OLS); the function returns 'status = "not_applicable"'. For logistic models also not applicable.

**Panel balance requirement**: No constraint at the unit level.

**Failure conditions**: Returns 'status = "missing_lm_data"' for units missing the required statistics.

**Logistic interpretation limit**: Not applicable.

Value

A named list with elements:

'local_lm': Data frame with columns 'unit_id', 'statistic', 'p_value', 'df', 'status'.
'n_tested': Number of units tested.
'n_failed': Number of units where the test could not be computed.
'status': Overall status.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(
  id   = rep(1:4, each = 5),
  time = rep(1:5, 4),
  y    = rnorm(20),
  x1   = rnorm(20)
)
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
diagnose_lm(fit)

Local F test diagnostic on a gwpr_fit object

Description

Performs a local F test (fixed effects vs. pooling) using per-unit local residuals stored in the fitted model object.

Usage

diagnose_local_f(object, ...)

Arguments

object

A 'gwpr_fit' object.

...

Currently ignored.

Details

**Applicable models**: gaussian (linear). Not applicable to logistic models; returns a 'status = "not_applicable"' result when 'family = "binomial"'.

**Panel balance requirement**: No constraint; the test uses per-unit local residuals already computed during fitting.

**Failure conditions**: If 'local_results' is empty or missing, all units are reported as failed. If a unit's local result does not contain the information needed (within and pooling residuals), that unit is reported as failed with an informative 'status'.

**Logistic interpretation limit**: Not applicable; see above.

Value

A named list with elements:

'local_f': Data frame with columns 'unit_id', 'statistic', 'p_value', 'df1', 'df2', 'status'.
'n_tested': Number of units tested.
'n_failed': Number of units where the test could not be computed.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(
  id   = rep(1:4, each = 5),
  time = rep(1:5, 4),
  y    = rnorm(20),
  x1   = rnorm(20)
)
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
diagnose_local_f(fit)

Run Moran's I diagnostic on a gwpr_fit object

Description

Extracts residuals from a fitted GWPR model (Pearson residuals for logistic models, raw residuals for linear models) and computes the panel Moran's I statistic.

Usage

diagnose_moran(object, spatial_weights, panel_index, ...)

Arguments

object

A 'gwpr_fit' object returned by 'fit_gwpr()' or similar.

spatial_weights

A row-standardised n x n spatial weights matrix. 'n' must equal the number of spatial individuals in the fitted model.

panel_index

A data.frame or list with columns/elements 'id' and 'time' that identify each element of 'object$residuals'.

...

Currently ignored.

Details

**Applicable models**: gaussian (linear residuals) and binomial (Pearson residuals).

**Panel balance**: See 'compute_panel_moran()'.

**Failure conditions**: Fails if 'object' is not a 'gwpr_fit', if 'object$residuals' is 'NULL', or if 'spatial_weights' dimensions do not match the number of individuals.

**Logistic interpretation limit**: Moran's I computed on Pearson residuals is exploratory; the asymptotic distribution differs from the linear case.

Value

A named list compatible with the 'diagnostics' slot of a 'gwpr_diagnostics' object. Contains the elements returned by 'compute_panel_moran()' plus 'residual_type'.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(
  id   = rep(1:4, each = 5),
  time = rep(1:5, 4),
  y    = rnorm(20),
  x1   = rnorm(20)
)
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
W <- matrix(1/3, nrow = 4, ncol = 4); diag(W) <- 0
idx <- dat[, c("id", "time")]
diagnose_moran(fit, W, idx)

Diagnostics Module for GWPR.light 1.0.0

Description

Unified diagnostic interface for Geographically Weighted Panel Regression models. Provides Moran's I (spatial autocorrelation), local F test (fixed vs. pooling), local Hausman test (fixed vs. random), and local Breusch-Pagan LM test.

Details

**Model applicability**

| Diagnostic | Linear | Logistic | Notes | |————–|——–|———-|——————————————–| | moran | yes | yes | Logistic uses Pearson residual | | f_test | yes | no | Requires within and pooling models | | hausman | yes | no | Only meaningful for random-effect models | | lm_test | yes | no | Pooling or random-effect models |

**Panel balance**

'compute_panel_moran()' is fully supported for balanced panels. For unbalanced panels a 'warning()' is issued and the function attempts computation using only the time periods present in every individual; results may be unreliable.

**Logistic interpretation**

Moran's I computed from Pearson residuals of a Logistic model does not follow the same asymptotic distribution as for linear models. Treat the test result as an exploratory heuristic, not a formal test.

Build a fixest::feglm formula string from the user-facing effect string

Description

Build a fixest::feglm formula string from the user-facing effect string

Usage

effect_to_feglm_fml(base_formula, effect, id_col, time_col)

Arguments

base_formula

A formula object (without fixed-effect terms).

effect

Character scalar: one of "individual", "time", "two-way", "nested".

id_col

Name of the individual ID column.

time_col

Name of the time column.

Value

A formula suitable for fixest::feglm().

Append random-effect terms to a base formula for glmmTMB

Description

Append random-effect terms to a base formula for glmmTMB

Usage

effect_to_glmmtmb_fml(base_formula, effect, id_col, time_col)

Arguments

base_formula

A formula object (without random-effect terms).

effect

Character scalar: one of "individual", "time", "two-way", "nested".

id_col

Name of the individual ID column.

time_col

Name of the time column.

Value

A formula suitable for glmmTMB::glmmTMB().

Map user-facing effect string to plm effect parameter

Description

Map user-facing effect string to plm effect parameter

Usage

effect_to_plm(effect)

Arguments

effect

Character scalar: one of '"individual"', '"time"', '"two-way"', '"nested"'.

Value

Character scalar accepted by 'plm::plm()' for its 'effect' argument: '"individual"', '"time"', or '"twoways"'. For '"nested"', the function stops with an informative message because 'plm' does not support nested effects without additional data conventions.

Estimate memory usage for a GWPR run

Description

Calculates an approximate memory requirement (in bytes) based on the data dimensions stored in a 'gwpr_context' object, the number of parallel workers, and whether the full distance matrix will be cached.

Usage

estimate_memory(context, workers = 1, cache_distance = NULL)

Arguments

context

A 'gwpr_context' list containing at least metadata$n_units, metadata$n_time, and metadata$n_vars. If these keys are absent, the function falls back to direct arguments n_units, n_time, and n_vars when supplied via .... Alternatively, pass a plain list with the required scalar fields directly.

workers

Positive integer. Number of parallel workers.

cache_distance

Logical or NULL. When TRUE the full n\_units \times n\_units distance matrix is assumed to be kept in memory. When NULL (default) the function assumes caching is enabled for a conservative estimate.

Details

The two main cost components are:

Distance matrix (when cache_distance = TRUE): n\_units^2 \times 8 bytes.
Local model working copies (one per worker): n\_rows \times n\_vars \times 8 \times workers bytes.

Value

A named list with class "gwpr_memory_estimate":

n_units: Number of spatial units.
n_time: Number of time periods.
n_vars: Number of explanatory variables.
n_rows: Total panel rows (n_units * n_time).
workers: Workers used for the estimate.
cache_distance: Whether distance caching was assumed.
distance_bytes: Bytes for the distance matrix (0 if not cached).
model_bytes: Bytes for local model copies across workers.
total_bytes: Total estimated bytes.
risk: Character risk level: "low", "medium", or "high".

Extract representative XY coordinates from an sf object

Description

For POINT geometries the point coordinates are returned directly. For all other geometry types (POLYGON, MULTIPOLYGON, etc.) 'sf::st_centroid()' is used to derive a representative point, with warnings suppressed (they are typically non-actionable geographic-CRS notes).

Usage

extract_coordinates(spatial)

Arguments

spatial

An 'sf' object.

Value

A numeric matrix with columns 'X' and 'Y', one row per feature.

Extract representative XY coordinates from an sf object

Description

Uses point coordinates for POINT geometries and centroids for other geometry types (POLYGON, MULTIPOLYGON, etc.).

Usage

extract_coords_from_sf(spatial)

Arguments

spatial

An 'sf' object.

Value

A numeric matrix with columns 'X' and 'Y'.

Extract the geometry column from an sf object

Description

Returns the 'sfc' geometry column of the supplied 'sf' object.

Usage

extract_geometry(spatial)

Arguments

spatial

An 'sf' object.

Value

An 'sfc' geometry column.

Extract coefficients and diagnostics from a local linear model

Description

Extract coefficients and diagnostics from a local linear model

Usage

extract_linear_local_result(local_result)

Arguments

local_result

A list as returned by 'fit_linear_local_model()'.

Value

A named list with elements:

'coefficients': Named numeric vector of local coefficient estimates.
'se': Named numeric vector of standard errors (same names).
'tvalues': Named numeric vector of t-statistics.
'local_r2': Numeric scalar or 'NA_real_'.
'local_aic': Numeric scalar or 'NA_real_'.
'status': '"ok"' or '"failed"'.
'error': 'NULL' or character error message.

Extract coefficients and diagnostics from a local logistic model

Description

Computes predicted probabilities, predicted classes, Pearson residuals, and extracts coefficient estimates. Pearson residuals are defined as (y - p) / sqrt(p * (1 - p)) with p clipped to [eps, 1 - eps] to avoid division by zero.

Usage

extract_logistic_local_result(
  local_result,
  local_data,
  formula,
  threshold = 0.5,
  eps = 1e-15
)

Arguments

local_result

A list as returned by fit_logistic_local_model().

local_data

The data.frame used to fit the model (needed to extract y and to predict).

formula

The model formula (used to extract the response name).

threshold

Numeric scalar; classification threshold (default 0.5).

eps

Numeric scalar; clipping bound for probability (default 1e-15).

Value

A named list with elements:

coefficients: Named numeric vector of local coefficient estimates, or NA_real_ on failure.
prob: Numeric vector of predicted probabilities for the local data rows, or NA_real_ on failure.
class_pred: Integer vector (0/1) of predicted classes, or NA_real_ on failure.
pearson_resid: Numeric vector of Pearson residuals, or NA_real_ on failure.
status: "ok" or "failed".
error: NULL or character error message.

Fit GWPR with a given bandwidth

Description

Validates inputs, prepares data, builds spatial weights, and fits the Geographically Weighted Panel Regression for the specified bandwidth. Returns a gwpr_fit object.

Usage

fit_gwpr(
  formula,
  data,
  spatial,
  id,
  time,
  bandwidth,
  family = c("gaussian", "binomial"),
  model = c("within", "pooling", "random"),
  effect = c("individual", "time", "two-way", "nested"),
  kernel = c("bisquare", "gaussian", "exponential", "tricube", "boxcar"),
  adaptive = FALSE,
  threshold = 0.5,
  workers = 1L,
  seed = NULL,
  ...
)

Arguments

formula

A formula object.

data

A data.frame with panel data.

spatial

An sf object.

id

Character scalar; unit ID column name.

time

Character scalar; time column name.

bandwidth

Numeric scalar. The bandwidth to use (fixed distance or number of neighbours when adaptive = TRUE).

family

"gaussian" (default) or "binomial".

model

"within" (default), "pooling", or "random".

effect

"individual" (default), "time", "two-way", or "nested".

kernel

Kernel function name (default "bisquare").

adaptive

Logical; FALSE (default) for fixed bandwidth.

threshold

Numeric; classification threshold (binomial only, default 0.5).

workers

Positive integer; number of parallel workers (default 1).

seed

Integer random seed, or NULL.

...

Currently unused.

Value

A gwpr_fit object.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(
  id   = rep(1:4, each = 5),
  time = rep(1:5, 4),
  y    = rnorm(20),
  x1   = rnorm(20)
)
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
print(fit)

Fit a single local panel linear model

Description

Fits a geographically weighted panel linear model for one focal spatial unit. Errors are caught and returned as a structured failure result rather than propagating to the caller.

Usage

fit_linear_local_model(
  local_data,
  formula,
  model,
  effect,
  weights,
  index,
  random_method = "swar"
)

Arguments

local_data

A 'pdata.frame' or plain 'data.frame' with columns for all formula variables plus the panel indices.

formula

A formula object.

model

Character scalar: '"pooling"', '"within"', or '"random"'.

effect

Character scalar: '"individual"', '"time"', '"two-way"', or '"nested"'.

weights

Numeric vector of kernel weights aligned with the rows of 'local_data'.

index

Character vector of length 2 giving the panel index column names: 'c(id_col, time_col)'.

random_method

Character scalar; estimation method for variance components when 'model = "random"' (default '"swar"').

Value

A list with elements:

'fit': The fitted model object, or 'NULL' on failure.
'status': '"ok"' or '"failed"'.
'error': 'NULL' or character string with the error message.
'metadata': Named list with additional model-fitting metadata, e.g. flagging single-observation individuals for within models.

Fit a fixed-effects logistic model via fixest::feglm

Description

Fit a fixed-effects logistic model via fixest::feglm

Usage

fit_logistic_fixed(
  local_data,
  formula,
  effect,
  weights,
  id_col,
  time_col,
  family = "binomial"
)

Arguments

local_data

A data.frame containing all formula variables plus the panel index columns.

formula

A formula object with a binary response (no fixed-effect terms; those are added automatically from effect).

effect

Character scalar: one of "individual", "time", "two-way", "nested".

weights

Numeric vector of kernel weights.

id_col

Name of the individual ID column.

time_col

Name of the time column.

family

Character scalar reserved for future extension.

Value

A fixest object.

Fit a single local binary logistic panel model

Description

Dispatches to the correct backend (pooling, fixed, or random) based on model, wrapping execution in a tryCatch so that convergence failures or complete-separation errors are captured rather than propagated.

Usage

fit_logistic_local_model(
  local_data,
  formula,
  model,
  effect,
  weights,
  index,
  threshold = 0.5,
  family = "binomial"
)

Arguments

local_data

A data.frame with all formula variables and panel index columns.

formula

A formula object.

model

Character scalar: "pooling", "fixed", or "random".

effect

Character scalar: "individual", "time", "two-way", or "nested".

weights

Numeric vector of kernel weights aligned with nrow(local_data).

index

Character vector of length 2: c(id_col, time_col).

threshold

Numeric scalar; classification threshold (default 0.5).

family

Character scalar reserved for future extension (currently only "binomial" is implemented).

Value

A list with elements:

fit: The fitted model object, or NULL on failure.
status: "ok" or "failed".
error: NULL or character error message.
metadata: Named list with additional fitting metadata.

Fit a pooled logistic model via stats::glm

Description

Fit a pooled logistic model via stats::glm

Usage

fit_logistic_pooling(local_data, formula, weights, family = "binomial")

Arguments

local_data

A data.frame containing all formula variables.

formula

A formula object with a binary response.

weights

Numeric vector of kernel weights (same length as nrow(local_data)).

family

Character scalar reserved for future extension; currently only "binomial" is supported.

Value

A glm object.

Fit a random-effects logistic model via glmmTMB::glmmTMB

Description

Fit a random-effects logistic model via glmmTMB::glmmTMB

Usage

fit_logistic_random(
  local_data,
  formula,
  effect,
  weights,
  id_col,
  time_col,
  family = "binomial"
)

Arguments

local_data

A data.frame containing all formula variables plus the panel index columns.

formula

A formula object with a binary response (no random-effect terms; those are added automatically from effect).

effect

Character scalar: one of "individual", "time", "two-way", "nested".

weights

Numeric vector of kernel weights.

id_col

Name of the individual ID column.

time_col

Name of the time column.

family

Character scalar reserved for future extension.

Value

A glmmTMB object.

Format a human-readable memory warning message

Description

Converts a 'gwpr_memory_estimate' object (produced by estimate_memory) into a readable character string. For high-risk estimates the message also includes actionable suggestions.

Usage

format_memory_warning(memory_estimate)

Arguments

memory_estimate

A 'gwpr_memory_estimate' list, typically the return value of estimate_memory.

Value

A character string containing the warning text. The string is suitable for passing to message() or warning().

Extract distances from one focus unit to all others

Description

Extract distances from one focus unit to all others

Usage

get_local_distances(distance_context, focus_id)

Arguments

distance_context

A list as returned by 'build_distance_context()', or a plain n x n numeric distance matrix. If a plain matrix is supplied, the rows and columns must already be in the same order as the spatial units.

focus_id

Integer scalar (1-based) or character matching a row/column name of the distance matrix. The focal unit whose distances are extracted.

Value

A numeric vector of length n giving the distance from the focus unit to every unit (including itself, which is 0).

Fit a Geographically Weighted Panel Regression (main entry point)

Description

Orchestrates the complete GWPR pipeline: input validation, data preparation, optional memory estimation, optional bandwidth search, model fitting, and optional diagnostics.

Usage

gwpr(
  formula,
  data,
  spatial,
  id,
  time,
  family = c("gaussian", "binomial"),
  model = c("within", "pooling", "random"),
  effect = c("individual", "time", "two-way", "nested"),
  bandwidth = NULL,
  bandwidth_method = c("sgd", "grid", "random"),
  bandwidth_control = list(),
  kernel = c("bisquare", "gaussian", "exponential", "tricube", "boxcar"),
  adaptive = FALSE,
  threshold = 0.5,
  workers = 1L,
  seed = NULL,
  diagnostics = TRUE,
  ...
)

Arguments

formula

A formula object specifying the model (e.g. y ~ x1 + x2).

data

A data.frame containing the panel data. Must include the columns referenced by id and time.

spatial

An sf object with one row per spatial unit. Must include the column referenced by id.

id

Character scalar. Name of the unit (individual) ID column shared by data and spatial.

time

Character scalar. Name of the time-period column in data.

family

Character scalar. Model family: "gaussian" (default, linear GWPR) or "binomial" (binary panel logistic GWPR).

model

Character scalar. Panel model type: "within" (default), "pooling", or "random".

effect

Character scalar. Panel effect: "individual" (default), "time", "two-way", or "nested".

bandwidth

Numeric scalar or NULL (default). When NULL the bandwidth is selected automatically via select_bandwidth().

bandwidth_method

Character scalar. Method for automatic bandwidth search: "sgd" (default), "grid", or "random". Ignored when bandwidth is supplied.

bandwidth_control

Named list of control parameters passed to the bandwidth search function. For "grid": lower, upper, step. For "sgd": lower, upper, learning rate etc. For "random": lower, upper, n_samples.

kernel

Character scalar. Kernel function: "bisquare" (default), "gaussian", "exponential", "tricube", or "boxcar".

adaptive

Logical scalar. FALSE (default) uses a fixed distance bandwidth; TRUE uses an adaptive (k-nearest-neighbour) bandwidth.

threshold

Numeric scalar. Classification threshold for family = "binomial" (default 0.5).

workers

Positive integer. Number of parallel workers. 1 (default) uses serial execution; values > 1 enable explicit parallelism.

seed

Integer or NULL. Random seed for reproducibility of any stochastic steps (bandwidth search, parallel RNG).

diagnostics

Logical scalar. When TRUE (default), diagnose_gwpr() is called after fitting and its results are stored in the returned object.

...

Additional arguments passed to the bandwidth search or fitting functions.

Value

A gwpr_fit object. Key fields:

local_results: Per-unit local model results.
predictions: In-sample predicted values / probabilities.
residuals: Residuals or Pearson residuals.
metrics: Overall goodness-of-fit metrics.
spatial_results: Data frame of per-unit coefficients.
search: Bandwidth search result (gwpr_bandwidth), or NULL when bandwidth was supplied directly.
diagnostics: A gwpr_diagnostics object, or NULL.

Examples


# Minimal linear GWPR with a fixed bandwidth
library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(
  id   = rep(1:4, each = 5),
  time = rep(1:5, 4),
  y    = rnorm(20),
  x1   = rnorm(20)
)
fit <- gwpr(y ~ x1, data = dat, spatial = pts, id = "id", time = "time",
            bandwidth = 2, diagnostics = FALSE, workers = 1)
print(fit)

Memory Estimation Module for GWPR.light 1.0.0

Description

Functions for estimating memory usage before running GWPR models. The module provides warnings about memory risk levels to help users avoid out-of-memory errors. These functions only warn; they never stop execution.

Metrics Module

Description

Functions for computing evaluation metrics for linear and logistic panel regression models.

Linear GWPR Engine for GWPR.light 1.0.0

Description

Internal functions for fitting Geographically Weighted Panel Regression with a Gaussian (linear) response. Supports pooling, within, and random panel models, plus individual, time, two-way, and nested effects.

Binary Panel Logistic Engine for GWPR.light 1.0.0

Description

Internal functions for fitting Geographically Weighted Panel Regression with a binary (binomial) response. Supports pooling (stats::glm), fixed effects (fixest::feglm), and random effects (glmmTMB::glmmTMB) panel models, plus individual, time, two-way, and nested effects.

The family parameter is reserved for future multi-class extension; in version 1.0.0 only "binomial" is supported.

Construct a new gwpr_context object

Description

Creates the standardised internal context list used to pass state between GWPR modules. Any field not supplied defaults to 'NULL' (or, for 'metadata' and 'warnings', to their appropriate empty types).

Usage

new_gwpr_context(
  call = NULL,
  formula = NULL,
  family = NULL,
  model = NULL,
  effect = NULL,
  id = NULL,
  time = NULL,
  kernel = NULL,
  adaptive = NULL,
  threshold = NULL,
  workers = NULL,
  seed = NULL,
  raw_data = NULL,
  raw_spatial = NULL,
  panel_data = NULL,
  spatial_data = NULL,
  id_map = NULL,
  coords = NULL,
  model_frame = NULL,
  model_matrix = NULL,
  response = NULL,
  metadata = list(),
  warnings = character(),
  ...
)

Arguments

call

The matched call from the top-level API function.

formula

A formula object.

family

Character: '"gaussian"' or '"binomial"'.

model

Character: '"pooling"', '"within"', or '"random"'.

effect

Character: '"individual"', '"time"', '"two-way"', or '"nested"'.

id

Name of the unit ID column.

time

Name of the time column.

kernel

Kernel name.

adaptive

Logical; 'TRUE' for adaptive bandwidth.

threshold

Numeric classification threshold (Logistic).

workers

Number of parallel workers.

seed

Integer random seed or 'NULL'.

raw_data

The original user-supplied data frame.

raw_spatial

The original user-supplied sf object.

panel_data

Processed panel data frame.

spatial_data

Processed sf object.

id_map

Named integer vector mapping unit IDs to row indices.

coords

Matrix of spatial coordinates.

model_frame

Model frame derived from formula and panel_data.

model_matrix

Design matrix.

response

Numeric response vector.

metadata

Named list of supplementary information.

warnings

Character vector of accumulated warnings.

...

Additional named fields stored in the context list.

Value

A named list with class '"gwpr_context"'.

Minimal parallel_map implementation

Description

Wraps 'lapply' when 'workers = 1' and 'parallel::mclapply' when 'workers > 1'. This stub is superseded once 'parallel.R' is available.

A thin wrapper that uses plain lapply when workers = 1 and switches to future.apply::future_lapply with a multisession plan for workers > 1. The global future plan is always restored to sequential after the call, preventing side-effects.

Usage

parallel_map(x, fn, workers = 1, seed = NULL, ..., packages = NULL)

parallel_map(x, fn, workers = 1, seed = NULL, ..., packages = NULL)

Arguments

x

A list (or vector) of inputs to iterate over.

fn

A function to apply to each element of x. The function receives the element as its first argument; additional arguments are passed via ....

workers

Integer scalar. Number of parallel workers. 1 (default) uses serial lapply and never touches future.

seed

Integer scalar or NULL. Random seed for reproducibility. In serial mode the R RNG is seeded with set.seed(seed). In parallel mode the seed is forwarded to future_lapply via the future.seed argument using L'Ecuyer-CMRG streams.

...

Additional arguments forwarded to fn.

packages

Character vector of package names that workers need to load, or NULL (default). Ignored in serial mode.

Details

Worker-level errors are caught and returned as character strings (prefixed with "ERROR: ") rather than aborting the entire call.

Value

A list of the same length as x. Elements where fn threw an error are replaced with a character string "ERROR: <msg>".

Examples

result <- parallel_map(1:3, function(x) x^2, workers = 1)
stopifnot(identical(result, list(1, 4, 9)))

Parallel Execution Module

Description

Unified parallel execution interface for bandwidth search, local model fitting, and diagnostics. Shields backend differences and ensures CRAN-friendly behaviour.

Predict response values for a local linear model

Description

Returns predicted values for the rows of 'local_data', using the fitted model object stored in 'local_result'.

Usage

predict_linear_local_model(local_result, local_data)

Arguments

local_result

A list as returned by 'fit_linear_local_model()'.

local_data

The data frame used for prediction (same structure as the training data).

Value

Numeric vector of predicted values (same length as 'nrow(local_data)'). Returns a vector of 'NA_real_' on failure.

Predict probabilities for a local logistic model

Description

Returns predicted probabilities (type = "response") for the rows of local_data, using the fitted model stored in local_result.

Usage

predict_logistic_local_model(local_result, local_data)

Arguments

local_result

A list as returned by fit_logistic_local_model().

local_data

A data.frame used for prediction.

Value

Numeric vector of probabilities in [0, 1]. Returns a vector of NA_real_ on failure.

Prepare all data structures for GWPR fitting

Description

Orchestrates the full data preparation pipeline: panel data extraction, spatial data extraction, ID mapping, model frame and model matrix construction. Updates and returns a 'gwpr_context'.

Usage

prepare_data(context)

Arguments

context

A 'gwpr_context' object with at least 'formula', 'family', 'id', 'time', 'model', 'raw_data', and 'raw_spatial' populated.

Value

An updated 'gwpr_context' with 'panel_data', 'spatial_data', 'id_map', 'coords', 'model_frame', 'model_matrix', 'response', and 'metadata' filled in.

Prepare panel data from the raw data in context

Description

Extracts the 'id', 'time', and formula variables from 'raw_data'. Adds a 'raw_row_id' column preserving the original row positions. Sorts by id then time. Records panel balance information in 'metadata'. For 'within' models, records single-observation individuals in 'metadata'.

Usage

prepare_panel_data(context)

Arguments

context

A 'gwpr_context' with 'raw_data', 'formula', 'id', 'time', and 'model' populated.

Value

Updated context with 'panel_data' and 'metadata' populated.

Extract and align spatial data for GWPR fitting

Description

Extracts geometry and representative coordinates from the 'raw_spatial' sf object. Aligns spatial rows to the unique individual IDs present in 'panel_data' via 'id_map'.

Usage

prepare_spatial_data(context)

Arguments

context

A 'gwpr_context' with 'raw_spatial', 'panel_data', and 'id' populated.

Value

Updated context with 'spatial_data' and 'coords' populated.

Print a gwpr_bandwidth object

Description

Displays the search method, best bandwidth, criterion score, and number of iterations explored.

Usage

## S3 method for class 'gwpr_bandwidth'
print(x, ...)

Arguments

x

A 'gwpr_bandwidth' object.

...

Currently ignored.

Value

Invisibly returns 'x'.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(id = rep(1:4, each = 5), time = rep(1:5, 4),
                  y = rnorm(20), x1 = rnorm(20))
bw <- select_bandwidth(y ~ x1, data = dat, spatial = pts, id = "id",
  time = "time", method = "grid",
  control = list(lower = 1, upper = 3, step = 1), workers = 1)
print(bw)

Print a gwpr_diagnostics object

Description

Displays each diagnostic test name and, where available, its statistic and p-value.

Usage

## S3 method for class 'gwpr_diagnostics'
print(x, ...)

Arguments

x

A 'gwpr_diagnostics' object.

...

Currently ignored.

Value

Invisibly returns 'x'.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(id = rep(1:4, each = 5), time = rep(1:5, 4),
                  y = rnorm(20), x1 = rnorm(20))
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
diag_obj <- diagnose_gwpr(fit, diagnostics = c("f_test", "hausman"))
print(diag_obj)

Print a gwpr_fit object

Description

Displays a concise summary of a fitted GWPR model: family, panel model type, effect, bandwidth, and top-level goodness-of-fit metrics.

Usage

## S3 method for class 'gwpr_fit'
print(x, ...)

Arguments

x

A 'gwpr_fit' object.

...

Currently ignored.

Value

Invisibly returns 'x'.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(id = rep(1:4, each = 5), time = rep(1:5, 4),
                  y = rnorm(20), x1 = rnorm(20))
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
print(fit)

Result Object Module for GWPR.light 1.0.0

Description

S3 classes and constructor functions for the three result objects returned by the GWPR.light public API: 'gwpr_fit', 'gwpr_bandwidth', and 'gwpr_diagnostics'. Also provides 'build_spatial_results()' for assembling a data.frame that can be aligned with an 'sf' geometry column.

Score a single bandwidth candidate

Description

Calls the 'scorer' function for a given bandwidth and records the result together with timing information, model counts, and metric values.

Usage

score_bandwidth_candidate(context, bandwidth, scorer)

Arguments

context

A 'gwpr_context' list.

bandwidth

Numeric scalar; the candidate bandwidth to evaluate.

scorer

A function with signature 'scorer(context, bandwidth)' returning a named list with at minimum:

'score': Numeric scalar. The criterion value (lower is better).
'criterion': Character scalar. Name of the criterion.
'n_local_models': Integer. Number of local models attempted.
'n_failed_local_models': Integer. Number of local models that failed.
'metrics': Named list of aggregate metrics (e.g. R2, MSE, etc.).

Value

A named list describing the candidate result:

'bandwidth': The evaluated bandwidth.
'score': Numeric criterion score, or 'NA_real_' on failure.
'criterion': Name of the scoring criterion.
'status': '"ok"' or '"failed"'.
'error_message': 'NA_character_' or error text.
'warning_message': 'NA_character_' or warning text.
'elapsed_time': Elapsed wall-clock time in seconds.
'n_local_models': Number of local models attempted.
'n_failed_local_models': Number of local models that failed.
'r2', 'mse', 'rmse', 'mae': Linear metrics, or 'NA_real_'.
'log_loss', 'accuracy', 'precision', 'recall', 'f1_score': Logistic metrics, or 'NA_real_'.

Select an optimal bandwidth for GWPR

Description

Validates inputs, prepares data, and dispatches to the appropriate bandwidth search algorithm: grid search, stochastic gradient descent (sgd), or random search, depending on the method argument.

Usage

select_bandwidth(
  formula,
  data,
  spatial,
  id,
  time,
  family = c("gaussian", "binomial"),
  model = c("within", "pooling", "random"),
  effect = c("individual", "time", "two-way", "nested"),
  method = c("sgd", "grid", "random"),
  control = list(),
  kernel = c("bisquare", "gaussian", "exponential", "tricube", "boxcar"),
  adaptive = FALSE,
  threshold = 0.5,
  workers = 1L,
  seed = NULL,
  ...
)

Arguments

formula

A formula object.

data

A data.frame with panel data.

spatial

An sf object.

id

Character scalar; unit ID column name.

time

Character scalar; time column name.

family

"gaussian" or "binomial".

model

"within", "pooling", or "random".

effect

"individual", "time", "two-way", or "nested".

method

Bandwidth search method: "sgd" (default), "grid", or "random".

control

Named list of search control parameters.

kernel

Kernel function name.

adaptive

Logical; TRUE for adaptive (k-NN) bandwidth.

threshold

Numeric; classification threshold (binomial only).

workers

Positive integer; number of parallel workers.

seed

Integer random seed, or NULL.

...

Additional arguments (currently unused).

Value

A gwpr_bandwidth object with fields:

best_bandwidth: The selected bandwidth value.
best_score: The criterion value at the best bandwidth.
method: The search method used.
history: Search history data frame.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(
  id   = rep(1:4, each = 5),
  time = rep(1:5, 4),
  y    = rnorm(20),
  x1   = rnorm(20)
)
bw <- select_bandwidth(
  y ~ x1, data = dat, spatial = pts, id = "id", time = "time",
  method  = "grid",
  control = list(lower = 0.5, upper = 2, step = 0.5),
  workers = 1
)
bw$best_bandwidth

Spatial SF Module for GWPR.light 1.0.0

Description

Internal functions providing the 'sf'-first spatial interface. These functions abstract geometry extraction, coordinate derivation, spatial alignment to panel data, and neighbour-structure construction. They are used by the data preparation and diagnostics modules.

Standardise a binary response variable to 0/1 numeric

Description

Converts logical and two-level factor responses to 0/1 integer. Numeric 0/1 vectors are returned unchanged (as numeric). Other inputs raise an error.

Usage

standardize_binary_response(y)

Arguments

y

A vector that is 0/1 numeric, logical, or a two-level factor.

Details

For factor inputs the first level is mapped to 0 and the second level is mapped to 1.

Value

A numeric vector of 0s and 1s.

Coerce a binary response to a numeric 0/1 integer vector

Description

Converts logical and two-level factor responses to 0/1 integer. Numeric 0/1 vectors are returned unchanged. Factors with more than two levels raise an error via validate_binary_response().

Usage

standardize_logistic_response(y)

Arguments

y

A numeric, logical, or factor vector.

Value

An integer vector of 0s and 1s.

Summarise a gwpr_bandwidth object

Description

Prints the search method, best bandwidth, and a brief history overview.

Usage

## S3 method for class 'gwpr_bandwidth'
summary(object, ...)

Arguments

object

A 'gwpr_bandwidth' object.

...

Currently ignored.

Value

Invisibly returns 'object'.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(id = rep(1:4, each = 5), time = rep(1:5, 4),
                  y = rnorm(20), x1 = rnorm(20))
bw <- select_bandwidth(y ~ x1, data = dat, spatial = pts, id = "id",
  time = "time", method = "grid",
  control = list(lower = 1, upper = 3, step = 1), workers = 1)
summary(bw)

Summarise a gwpr_diagnostics object

Description

Prints each diagnostic test result with statistic and p-value where available.

Usage

## S3 method for class 'gwpr_diagnostics'
summary(object, ...)

Arguments

object

A 'gwpr_diagnostics' object.

...

Currently ignored.

Value

Invisibly returns 'object'.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(id = rep(1:4, each = 5), time = rep(1:5, 4),
                  y = rnorm(20), x1 = rnorm(20))
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
diag_obj <- diagnose_gwpr(fit, diagnostics = c("f_test", "hausman"))
summary(diag_obj)

Summarise a gwpr_fit object

Description

Prints the global model overview, quantile summary of local coefficients (when available), and goodness-of-fit metrics.

Usage

## S3 method for class 'gwpr_fit'
summary(object, ...)

Arguments

object

A 'gwpr_fit' object.

...

Currently ignored.

Value

Invisibly returns 'object'.

Examples


library(sf)
pts <- sf::st_as_sf(
  data.frame(id = 1:4, X = c(0,1,0,1), Y = c(0,0,1,1)),
  coords = c("X", "Y"), crs = NA_integer_
)
dat <- data.frame(id = rep(1:4, each = 5), time = rep(1:5, 4),
                  y = rnorm(20), x1 = rnorm(20))
fit <- fit_gwpr(y ~ x1, data = dat, spatial = pts, id = "id",
                time = "time", bandwidth = 2, workers = 1)
summary(fit)

Validate bandwidth parameter

Description

Checks that the supplied bandwidth is legal: * fixed (adaptive = FALSE): must be a finite positive numeric scalar. * adaptive (adaptive = TRUE): must be a finite positive integer scalar.

Usage

validate_bandwidth(bandwidth, adaptive)

Arguments

bandwidth

Numeric scalar. The bandwidth value to validate.

adaptive

Logical scalar. 'TRUE' for adaptive (k-nearest-neighbour) bandwidth, 'FALSE' for fixed distance bandwidth.

Value

Invisibly returns 'TRUE' when validation passes.

Validate bandwidth search control parameters

Description

Validate bandwidth search control parameters

Usage

validate_bandwidth_control(method, control, adaptive)

Arguments

method

Character: '"grid"', '"sgd"', or '"random"'.

control

Named list of control parameters.

adaptive

Logical; if 'TRUE', bandwidth is a neighbour count.

Value

Invisibly returns 'TRUE' when valid; stops otherwise.

Validate that a response vector is suitable for binary logistic regression

Description

Stops with an informative error when y is a factor with more than two levels. Passes through numeric 0/1 vectors and two-level factors unchanged.

Usage

validate_binary_response(y)

Arguments

y

A numeric, logical, or factor vector.

Value

Invisibly TRUE when validation passes.

Validate the response variable against the specified family

Description

Validate the response variable against the specified family

Usage

validate_family_response(data, formula, family)

Arguments

data

A data frame.

formula

A formula; the left-hand side is the response variable.

family

Character string: '"gaussian"' or '"binomial"'.

Value

Invisibly returns 'TRUE' when valid; stops otherwise.

Validate a model formula against a data frame

Description

Validate a model formula against a data frame

Usage

validate_formula(formula, data)

Arguments

formula

A formula object.

data

A data frame containing the model variables.

Value

Invisibly returns 'TRUE' when valid; stops with an informative message when a problem is detected.

Validate a gwpr_context object

Description

Checks that all core fields required for model fitting are non-'NULL'. Stops with an informative message listing every missing field.

Usage

validate_gwpr_context(context)

Arguments

context

A list (typically of class '"gwpr_context"') to validate.

Details

Core fields: 'formula', 'family', 'id', 'time', 'model', 'effect', 'kernel', 'adaptive', 'threshold', 'workers'.

Value

Invisibly returns 'TRUE' when all core fields are present.

Validate all inputs to the main GWPR functions

Description

This is the top-level validation entry point. It calls the individual 'validate_*' helpers and also checks 'model', 'effect', and 'kernel'.

Usage

validate_inputs(
  formula,
  data,
  spatial,
  id,
  time,
  family = c("gaussian", "binomial"),
  model = c("within", "pooling", "random"),
  effect = c("individual", "time", "two-way", "nested"),
  kernel = c("bisquare", "gaussian", "exponential", "tricube", "boxcar"),
  adaptive = FALSE,
  workers = 1L
)

Arguments

formula

A formula object.

data

A data frame.

spatial

An 'sf' object.

id

Name of the unit ID column (character).

time

Name of the time column (character).

family

'"gaussian"' or '"binomial"'.

model

'"pooling"', '"within"', or '"random"'.

effect

'"individual"', '"time"', '"two-way"', or '"nested"'.

kernel

One of '"bisquare"', '"gaussian"', '"exponential"', '"tricube"', '"boxcar"'.

adaptive

Logical.

workers

Positive integer.

Value

Invisibly returns 'TRUE' when all checks pass; stops otherwise.

Validate panel index columns in a data frame

Description

Validate panel index columns in a data frame

Usage

validate_panel_index(data, id, time)

Arguments

data

A data frame.

id

Name of the unit (individual) index column.

time

Name of the time index column.

Value

Invisibly returns 'TRUE' when valid; stops otherwise.

Validate a spatial sf object

Description

Validate a spatial sf object

Usage

validate_spatial(spatial, id)

Arguments

spatial

An 'sf' object representing spatial units.

id

Name of the ID column that must be present in 'spatial'.

Value

Invisibly returns 'TRUE' when valid; stops otherwise.

Validate the workers argument

Description

Validate the workers argument

Usage

validate_workers(workers)

Arguments

workers

Number of parallel workers. Must be a positive integer.

Value

Invisibly returns 'TRUE' when valid; stops otherwise.

Input Validation Functions for GWPR.light 1.0.0

Description

These internal functions validate user inputs before any expensive computation, providing early and consistent error messages.

Distance and Kernel Weights Module for GWPR.light 1.0.0

Description

Internal functions for computing spatial distances and geographically weighted kernel weights. Supports fixed and adaptive bandwidths, and five kernel functions: bisquare, gaussian, exponential, tricube, and boxcar.

Execute an expression with a reproducible seed

Description

Sets set.seed(seed) before evaluating expr, then restores the prior RNG state so the caller's random stream is unaffected.

Usage

with_reproducible_seed(seed, expr)

Arguments

seed

Integer scalar. Seed value passed to set.seed.

expr

An R expression to evaluate.

Value

The value of expr.

Examples

r1 <- with_reproducible_seed(42, runif(3))
r2 <- with_reproducible_seed(42, runif(3))
stopifnot(identical(r1, r2))

Package {GWPR.light}

Create a placeholder NA history row for epochs after early stopping

Description

Usage

Build a scorer function for bandwidth search

Description

Usage

Arguments

Value

Convert a list of history row lists to a data.frame

Description

Usage

California (sf)

Description

Usage

Format

Author(s)

Examples

Panel Dataset for Testing GWPR

Description

Usage

Format

Author(s)

Examples

Reorder spatial rows to match panel ID order

Description

Usage

Arguments

Value

Public API for GWPR.light 1.0.0

Description

Assert that an object is an sf object

Description

Usage

Arguments

Value

Grid Search for Bandwidth Selection

Description

Random Bandwidth Optimizer ('bandwidth_random.R')

Description

SGD Bandwidth Search ('bandwidth_sgd.R')

Description

Build a distance context object

Description

Usage

Arguments

Value

Build a mapping from panel unit IDs to spatial row indices

Description

Usage

Arguments

Details

Value

Build the model frame from the formula and panel data

Description

Usage

Arguments

Value

Build the design matrix and response vector

Description

Usage

Arguments

Value

Build a neighbour structure from an sf object

Description

Usage

Arguments

Details

Value

Classify memory risk level

Description

Usage

Arguments

Details

Value

Compute a full pairwise distance matrix from a coordinate matrix

Description

Usage

Arguments

Value