% Generated by roxygen2 (4.0.1): do not edit by hand
\name{spanr}
\alias{spanr}
\title{To  carry out a search partition analysis (SPAN)}
\usage{
spanr(formula, weight = NA, data = NULL, cc = FALSE, makepos = TRUE,
  beta = NA, size = c(2, 2, 1), gamma = NA)
}
\arguments{
\item{formula}{A formula of the standard form \code{ y ~ x + u + v + w....}
giving the outcome \eqn{y} and predictor covariates \eqn{x, u, v, w....}. Operators other
than \code{+} should not be used. A survival object is allowed for \eqn{y}. For example,
\code{Surv(time,death) ~ x + u + v + w....} in which case optimation is with respect to
log-rank chi-square survival differences}

\item{data}{A data frame  with the variables in the formula.}

\item{weight}{A frequency weight attached to each row of data. Default, NA, indicates unit weight to each data row.}

\item{cc}{Indicates complete case analysis (default FALSE). If TRUE, a row of data is deleted if any one
attribute is missing. Otherwise a case is only deleted if any attribute is missing in a Boolean combination, as evaluated during a search.
Default FALSE}

\item{makepos}{If TRUE, and an attribute is found to be negative, the direction of \eqn{x} is reversed.
The rule for reversal is if  \eqn{mean of y|x=1 < mean of y|x=0}. When \code{y} is a survival object the rule for creversal is
if \eqn{ rate |x=1 < rate |x=0}  where \eqn{rate= case/person-time}.  Default is TRUE.}

\item{beta}{Parameter controlling degree of complexity penalising. Zero for no complexity penalising. NA (default)
or negative determines a value for beta automatically as 0.03 times the initial gradient of the compleity hull.}

\item{size}{Defines the  upper allowable size parameters of a disjunctive normal form used in the initial iteration of a search.
It is a list of length \eqn{q} defining \eqn{p_1,p_2,..p_q}. Default \code{c(2,2,1)} defines
\eqn{p_1=2}, \eqn{p_2=2}, and \eqn{p_3=1}.}

\item{gamma}{Parameter controlling balance of observations in  \eqn{A} and its complement  \eqn{!A}.
Default is NA, corresponds to no balancing. Balancing multiplies either MSE reduction or log-rank by
\eqn{(P_A(1-P_A))^\gamma}  where \eqn{P_A} is proportion of data in \eqn{A} to make a new optimization criterion.}
}
\value{
Object \code{spanr} with attributes:

\code{A} Data frame of same length as input data that is a binary indicator of belonging to \eqn{A}.

\code{g} Data frame of same length as input data, columns indicating  belonging to  the
subgroups of \eqn{A}

\code{h} Data frame of same length as input data, columns indicating  belonging to  the
subgroups of \eqn{!A}
}
\description{
To  carry out a search partition analysis (SPAN)
}
\details{
A function to search for an optimal Boolean combination partition. Optimization is with respect to
reduction in mean square error of \code{y} by split into partition \eqn{(A,!A)}, or if \code{y}
is a survival object, with respect to log-rank chi-square for survival differences of \eqn{(A,!A)}.
The Boolean expression for \eqn{A} is output in normal disjunctive form \eqn{A= g_1 | g_2 | g_3 | ...} and
the Boolean expression for the complement \eqn{!A} is also output in normal disjunctive form
\eqn{!A = h_1 | h_2 | h_3 | ...}.  Each element of the disjunctive forms, \eqn{g_i} of \eqn{A},  or \eqn{h_i} of \eqn{!A},  of the
represents a subgroup.  Subgroups are returned data frames.

If variables \code{x, u, v, w....} of the formula  are not coded binary, a pre-analysis is done to establish
an optimal cut of the variable. This is done, again with respect to reduction in MSE, or log-rank for a survival formula,
over values of the variable. If numeric,
a dictotomy is made by above/below a cut, the possible cuts being unique values of the variable if there are 20 or fewer,
otherwise at 20 equally spaced intervals. If factor variable, according each value of the factor.
}
\examples{
## 1. Simulate Bernoulli binary predictors x1, x2...x10, and outcome y
## For (x1 x2 x3) | (x1 x4) | (x1 x9),  make y~N(11,0.5) and N(10,0.5) otherwise.
x <- matrix(data=rbinom(10000,1,0.5),nrow=1000,ncol=10)
colnames(x) <- paste("x", seq(1:10), sep = "")
P <- ifelse((x[,1]& x[,2] & x[,3])|(x[,1] & x[,4])|x[,9] & x[,1], 1,0)
y <- ifelse(P,rnorm(1000,11,0.5),rnorm(1000,10,0.5) )
d <- data.frame(cbind(y,x))
sp <- spanr(formula= y ~ x1 +x2+x3+x4+x5+x6+x7+x8+x9+x10,data=d,size=c(1,2,2),beta=NA)
## 2. Survival analysis of pbc data
library(survival)
data(pbc)
sp <-with(pbc, spanr(formula = Surv(time, status==2) ~ trt + age + sex + ascites
               + hepato + spiders + edema + bili + chol + albumin
               + copper  + ast + trig + platelet + protime + stage,
                 beta=NA,cc=TRUE,gamma=1)   )
test <- cbind(pbc,sp$A)
##Kaplan-Meier curves of A  versus !A
x <- survfit(Surv(test$time,test$status==2) ~ test$A)
plot(x, col=c(1,2))
}
\author{
Roger Marshall <rj.marshall@auckland.ac.nz>, The University of Auckland, New Zealand
}

