% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/np_glm_b.R
\name{np_glm_b}
\alias{np_glm_b}
\title{Non-parametric linear models}
\usage{
np_glm_b(
  formula,
  data,
  family,
  loss = "selfinformation",
  loss_gradient,
  trials,
  n_draws,
  ask_before_full_sampling = TRUE,
  CI_level = 0.95,
  ROPE,
  seed = 1,
  mc_error = 0.01
)
}
\arguments{
\item{formula}{A formula specifying the model.}

\item{data}{A data frame in which the variables specified in the formula
will be found. If missing, the variables are searched for in the standard way.
However, it is strongly recommended that you use this argument so that other
generics for bayesics objects work correctly.}

\item{family}{A description of the error distribution and link function
to be used in the model. See \code{?}\link[stats]{glm} for more information.
Currently implemented families are \code{binomial()}, \code{poisson()},
\code{negbinom()}, and \code{gaussian()} (this last acts as a wrapper for}

\item{loss}{Either "selfinformation",
or a function that takes in two arguments, the first of which should
be the vector of outcomes and the second should be the expected value of y;
The outcome of the function should be the loss evaluated for each observation.
By default, the self-information loss is used (i.e., the negative log-likelihood).
Note: I really do mean the expected value of y, even for binomial (i.e., n*p).
If \code{family = negbinom()}, then a user-supplied loss function should
take three arguments: y, mu, and phi, where phi is the dispersion
parameter (i.e., \eqn{\text{Var}(y) = \mu + \mu^2/\phi}).}

\item{loss_gradient}{If loss is a user-defined function (as opposed to
"selfinformation"), supplying the gradient to the loss will
speed up the algorithm.}

\item{trials}{Integer vector giving the number of trials for each
observation if family = binomial().}

\item{n_draws}{integer.  Number of posterior draws to obtain.  If left missing,
the large sample approximation will be used.}

\item{ask_before_full_sampling}{logical.  If TRUE, the user will be asked
to specify whether they wish to commit to getting the full number of
posterior draws to obtain precise credible interval bounds.  Defaults to
TRUE because the bootstrap is computationally intensive.  Also,
parallelization via future::plan is highly recommended for full sample.}

\item{CI_level}{numeric. Credible interval level.}

\item{ROPE}{vector of positive values giving ROPE boundaries for each regression
coefficient.  Optionally, you can not include a ROPE boundary for the intercept.
If missing, defaults go to those suggested by Kruchke (2018).}

\item{seed}{integer.  Always set your seed!!!}

\item{mc_error}{If large sample approximation is not used, the number of
posterior draws will ensure that with 99\% probability the bounds of the
credible intervals will be within \eqn{\pm} \code{mc_error}.}
}
\value{
np_glm_b() returns an object of class "np_glm_b", which behaves as
a list with the following elements:
\itemize{
\item summary - a tibble giving results for regression coefficients.
}
}
\description{
np_glm_b uses general Bayesian inference with loss-likelihood bootstrap.
This is, as implemented here, a Bayesian non-parametric linear models
inferential engine. Applicable data types are continuous (use family =
gaussian()), count (use family = poisson()), or binomial
(use family = binomial()).
}
\details{
Consider a population parameter of interest defined in terms of
minimizing a loss function \eqn{\ell} wrt the population distribution:
\deqn{
 \theta(F_y) := \underset{\theta\in\Theta}{\text{argmax}} \int \ell(\theta,y)dF_y
}
If we use a non-parametric Dirichlet process prior on the distribution
of \eqn{y}, \eqn{F_y}, and let the concentration parameter go to zero, we
have the Bayesian bootstrap applied to a general Bayesian updating framework
dictated by the loss function.

By default, the loss function is the self-information loss, i.e., the negative
log likelihood.  This then resembles a typical \code{glm_b} implementation,
but is more robust to model misspecification.
}
\examples{
\donttest{
# Generate some data
set.seed(2025)
N = 500
test_data = 
  data.frame(x1 = rnorm(N),
             x2 = rnorm(N),
             x3 = letters[1:5])
test_data$outcome = 
  rbinom(N,1,1.0 / (1.0 + exp(-(-2 + test_data$x1 + 2 * (test_data$x3 \%in\% c("d","e")) ))))

# Fit the GLM via the (non-parametric) loss-likelihood bootstrap.
fit1 <-
  np_glm_b(outcome ~ x1 + x2 + x3,
           data = test_data,
           family = binomial())
fit1
summary(fit1,
        CI_level = 0.99)
plot(fit1)
coef(fit1)
credint(fit1)
predict(fit1,
        newdata = fit1$data[1,])
vcov(fit1)
}


}
\references{
S P Lyddon, C C Holmes, S G Walker, General Bayesian updating and the loss-likelihood bootstrap, Biometrika, Volume 106, Issue 2, June 2019, Pages 465–478, https://doi.org/10.1093/biomet/asz006
}
