\name{bms}
\alias{bms}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{ Bayesian Model Sampling and Averaging }
\description{
  Given data and prior information, this function samples all possible model combinations via MC3 or enumeration and returns aggregate results.
}
\usage{
bms(X.data, burn = 1000, iter = NA, nmodel = 500, mcmc = "bd",
  g = "UIP", mprior = "random", mprior.size = NA, user.int = TRUE, 
    start.value = NA, g.stats = TRUE, 
    logfile = FALSE, logstep = 10000, force.full.ols = FALSE)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{X.data}{ a data frame or a matrix, with the dependent variable in the first column, followed by the covariates (alternatively, \code{X.data} can also be provided as a \code{\link{formula}}).
   Note that \code{bms} automatically estimates a constant, therefore including constant terms is not necessary. }
  \item{burn}{ The (positive integer) number of burn-in draws for the MC3 sampler, defaults to 1000. (Not taken into account if mcmc="enumerate")}
  \item{iter}{ If mcmc is set to an MC3 sampler, then this is the number of iteration draws to be sampled (ex burn-ins), default 3000 draws. \cr 
        If \code{mcmc="enumerate"}, then iter is the number of models to be sampled, starting from 0 (defaults to \eqn{2^K-1}) - cf. \code{start.value}.}
  \item{nmodel}{ the number of best models for which information is stored (default 500). Best models are used for convergence analysis between likelihoods and MCMC frequencies, as well as likelihood-based inference.\cr
                Note that a very high value for \code{nmodel} slows down the sampler significantly. Set nmodel=0 to speed up sampling (if best model information is not needed).}
  \item{mcmc}{ a character denoting the model sampler to be used.\cr
              The MC3 sampler \code{mcmc="bd"} corresponds to a birth/death MCMC algogrithm. \code{mcmc="rev.jump"} enacts a reversible jump algorithm adding a "swap" step to the birth / death steps from "bd".\cr
             Alternatively, the entire model space may be fully enumerated by setting \code{mcmc="enumerate"} which will iterate all possible regressor combinations (Note: consider that this means \eqn{2^K} iterations, where K is the number of covariates.)\cr
             Default is full enumeration (\code{mcmc="enumerate"}) with less then 15 covariates, and the birth-death MC3 sampler (\code{mcmc="bd"}) with 15 covariatess or more. Cf. section 'Details' for more options.
         }
  \item{g}{ the hyperparameter on Zellner's g-prior for the regression coefficients.\cr
             \code{g="UIP"} corresponds to \eqn{g=N}, the number of observations (default);\cr
             \code{g="BRIC"} corresponds to the benchmark prior suggested by Fernandez, Ley and Steel (2001), i.e  \eqn{g=max(N, K^2)}, where K is the total number of covariates;\cr
             \code{g="RIC"} sets \eqn{g=K^2} and conforms to the risk inflation criterion by George and Foster (1994)\cr
             \code{g="HQ"} sets \eqn{g=log(N)^3} and asymptotically mimics the Hannan-Quinn criterion with \eqn{C_{HQ}=3}  (cf. Fernandez, Ley and Steel, 2001, p.395)\cr
             \code{g="EBL"} estimates a local empirical Bayes g-parameter (as in Liang et al. (2008));\cr
             \code{g="hyper"} takes the 'hyper-g' prior distribution (as in Liang et al., 2008) with the default hyper-parameter \eqn{a} set such that the prior expected shrinkage factor conforms to 'UIP';\cr
             This hyperparameter \eqn{a} can  be adjusted (between \eqn{2<a<=4}) by setting \code{g="hyper=2.9"}, for instance.\cr
             Alternatively, \code{g="hyper=UIP"} sets the prior expected value of the shrinkage factor equal to that of UIP (default), \code{g="hyper=BRIC"} sets it according to BRIC \cr
             cf section 'Details' fro more on the hyper-g prior
        }
  \item{mprior}{ a character denoting the model prior choice, defaulting to "random":\cr 
               \code{mprior="fixed"} denotes fixed common prior inclusion probabilities for each regressor as e.g. in Sala-i-Martin, Doppelhofer, and Miller(2004) -  for their fine-tuning, cf. \code{mprior.size}. Preferable to \code{mcmc="random"} if strong prior information on model size exists;\cr
               \code{mprior="random"} (default) triggers the 'random theta' prior by Ley and Steel (2008), who suggest a binomial-beta hyperprior on the a priori inclusion probability;\cr
               \code{mprior="uniform"} employs the uniform model prior;\cr
               \code{mprior="customk"} allows for custom model size priors (cf. \code{mprior.size});\cr
               \code{mprior="pip"} allows for custom prior inclusion probabilities (cf. \code{mprior.size});\cr
              Note that the prior on models with more than N-3 regressors is automatically zero: these models will not be sampled. 
              }
  \item{mprior.size}{ if \code{mprior} is "fixed" or "random", \code{mprior.size} is a scalar that denotes the prior expected value of the model size prior (default K/2).\cr
             If \code{mprior="customk"} then a custom model size prior can be provided as a K+1 vector detailing the priors from model size 0 to K 
             (e.g. rep(1,K+1) for the uniform model prior);\cr
             if \code{mprior="pip"}, then custom prior inclusion probabilities can be provided as a vector of size K, with elements in the interval (0,1)
             }
  \item{user.int}{'interactive mode': print out results to console after ending the routine and plots a chart (default TRUE). }
  \item{start.value}{ specifies the starting model of the iteration chain. For instance a specific model by the corresponding column indices (e.g. starting.model=numeric(K) starts from the null model including 
             solely a constant term) or \code{start.value=c(3,6)} for a starting model only including covariates 3 and 6.\cr
             If \code{start.model} is set to an integer (e.g. \code{start.model=15}) then that number of covariates (here: 15 covariates) is randomly chosen and the starting model is identified by those regressors with an OLS t-statistic>0.2.\cr 
             The default value \code{start.value=NA} corresponds to \code{start.value=min(ncol(X.data),nrow(X.data)-3)}. Note that \code{start.value=0} or \code{start.value=NULL} starts from the null model.\cr
             If \code{mcmc="enumerate"} then \code{start.value} is the index to start the iteration (default: 0, the null model) . Any number between 0 and \eqn{K^2-1} is admissible.
             }
 % \item{beta.save}{ if \code{beta.save=TRUE} (default) then the respective regression coefficients are saved along with the \code{nmodel} best models. 
 %           If \code{beta.save=FALSE}, the best models are saved without their coefficients, allowing for faster iteration, but limited functionality.
 %           (Note: if beta.save<0 then regression coefficients are saved for top models, but not the corresponding standard deviations).
 %           }
  \item{g.stats}{\code{TRUE} if statistics on the shrinkage factor g/(1+g) should be collected, defaulting to TRUE
             (Note: set \code{g.stats=FALSE} for faster iteration.) }
  \item{logfile}{ setting \code{logfile=TRUE} produces a logfile named \code{"test.log"} in your current working directory,
             in order to keep track of the sampling procedure. \code{logfile} equal to some filepath (like \code{logfile="subfolder/log.txt"}) puts the logfile 
             into that specified position. (default: \code{logfile=FALSE}). Note that \code{logfile=""} implies log printouts on the console.
             }
  \item{logstep}{ specifies at which number of posterior draws information is written to the log file; default: 10 000 iterations }
  \item{force.full.ols}{ default FALSE. If \code{force.full.ols=TRUE}, the OLS estimation part of the sampling procedure relies on slower matrix inversion, 
             instead of streamlined routines. \code{force.full.ols=TRUE} can slow down sampling but may deal better with highly collinear data}
  %\item{exact}{ deprecated }
  %\item{int}{ deprecated }
  %\item{printRes}{ deprecated }
  %\item{ask.set}{ deprecated }
  %\item{return.g.stats}{ deprecated }
  %\item{theta}{ deprecated }
  %\item{prior.msize}{ deprecated }
}
\details{
  Ad \code{mcmc}: \cr
  Interaction sampler: adding an ".int" to an MC3 sampler (e.g. "mcmc="bd.int") provides for special treatment of interaction terms.
             Interaction terms will only be sampled along with their component variables: In the colnumn names of X.data, interaction terms need to be 
             denominated by names consisting of the base terms separated by \code{#} (e.g. an interaction term of base variables \code{"A"}, \code{"B"} and \code{"C"} needs column name \code{"A#B#C"}). Then variable \code{"A#B#C"} will only be included in a model if all of the component variables ("A", "B", and "C") are included. 
             
  The MC3 samplers "\code{bd}", "\code{rev.jump}", "\code{bd.int}" and "\code{rev.jump.int}", iterate away from a starting model by adding, droping or swapping (only in the case of rev.jump) covariates. 
  
  In an MCMC fashion, they thus randomly draw a candidate model and then move to it in case its marginal likelihood (marg.lik.) is superior to the marg.lik. of the current model. 
  
  In case the candidate's marg.lik is inferior, it is randomly accepted or rejected according to a probability formed by the ratio of candidate marg.lik over currrent marg.lik.
  Over time, the sampler should thus converge to a sensible distribution. For aggregate results based on these MC3 frequencies, the first few iterations are typically disregarded (the 'burn-ins'). 
  
  Ad \code{g} and the hyper-g prior: The hyper-g prior introduced by Liang et al. (2008) puts a prior distribution on the shrinkage factor \eqn{g/(1+g)}, namely a Beta distribution \eqn{ Beta(1, 1/2-1)} 
  that is governed by the parameter \eqn{a}. \eqn{a=4} means a uniform prior distribution of the shrinkage factor, while \eqn{a>2} close to 2 concentrates the prior shrinkage factor close to one. \cr
  The prior expected value is \eqn{E(g/1+g)) = 2/a}. In this sense \code{g="hyper=UIP"} and \code{g="hyper=BRIC"} set the prior expected shrinkage such that it conforms to a fixed UIP-g (eqn{g=N}) or BRIC-g (\eqn{g=max(K^2,N)} ). 
  
  }
\value{
  A list of class \code{bma}, that may be displayed using e.g. \code{\link{summary.bma}} or \code{\link{coef.bma}}. The list contains the following elements:
  \item{info}{a list of aggregate statistics: \code{iter} is the number of iterations, \code{burn} the number of burn-ins.\cr
  The following have to be divided by \code{cumsumweights} to get posterior expected values: \code{inccount} are the posterior inclusion probabilities, \code{b1mo} and \code{b2mo} the first and second moment of coefficients, \code{add.otherstats} other statistics of interest (typically the moments of the shrinkage factor), \code{msize} is the post. expected model size, \code{k.vec} the posterior model size distribution, \code{pos.sign} the unconditional post. probability of positive coefficents, \code{corr.pmp} is the correlation between the best models' MCMC frequencies and their marg. likelihoods.\cr
  \code{timed} is the time that was needed for MCMC sampling, \code{cons} is the posterior expected value of the constant. \code{K} and \code{N} are the maximum number of covariates and the sample size, respectively.}
  \item{arguments}{a list of the evaluated function arguments provided to \code{bms} (see above)}
  \item{topmod}{a 'topmod' object containing the best drawn models. see \code{\link{topmod}} for more details}
  \item{start.pos}{the positions of the starting model. If bmao is a'bma' object this corresponds to covariates bmao$reg.names[bmao$start.pos]. If bmao is a chain that resulted from several starting models (cf. \code{\link{c.bma}}, then \code{start.pos} is a list detailing all of them.} 
  \item{gprior.info}{a list detailing information on the g-prior: \code{gtype} corresponds to argument \code{g} above, \code{is.constant} is FALSE if \code{gtype} is either "hyper" or "EBL", \code{return.g.stats} corresponds to argument \code{g.stats} above, \code{shrinkage.moments} contains the first and second moments of the shrinkage factor (only if \code{return.g.stats==TRUE}), \code{g} details the fixed g (if \code{is.constant==TRUE}), \code{hyper.parameter} corresponds to the hyper-g parameter \eqn{a} as in Liang et al. (2008) }
  \item{mprior.info}{a list detailing information on the model prior: \code{origargs} lists the original arguments to \code{mprior} and \code{mprior.size} above; \code{pmp(...)} is a function to calculate the prior model probability for a specific model; \code{mp.mode} corresponds to argument \code{mprior} above; \code{mp.msize} denotes the prior mode size; \code{mp.Kdist} is a (K+1) vector with the prior model size distribution from 0 to K}
  \item{X.data}{data.frame or matrix: corresponds to argument \code{X.data} above, possibly cleaned for NAs}
  \item{reg.names}{character vector: the covariate names to be used for X.data}
  \item{bms.call}{the original call to the \code{bms} function}
}
\references{ 
Feldkircher, M. and S. Zeugner (2009): Benchmark Priors Revisited: On Adaptive Shrinkage and the Supermodel Effect in Bayesian Model Averaging, IMF Working Paper 09/202.

Fernandez, C. E. Ley and M. Steel (2001): Benchmark priors for Bayesian model averaging. Journal of Econometrics 100(2), 381--427 
   
Ley, E. and M. Steel (2008): On the Effect of Prior Assumptions in Bayesian Model Averaging with Applications to Growth Regressions. working paper
   
Liang, F., Paulo, R., Molina, G., Clyde, M. A., and Berger, J. O. (2008). Mixtures of g Priors for Bayesian Variable Selection. Journal of the American Statistical Association 103, 410-423.
   
Sala-i-Martin, X. and G. Doppelhofer and R.I. Miller (2004): Determinants of long-term growth: a Bayesian averaging of classical estimates (BACE) approach. American Economic Review 94(4), 813--835   
}   
\author{Martin Feldkircher and Stefan Zeugner}
\note{ There are several ways to speed-up sampling: \code{nmodel=10} saves only the ten best models, at most a marginal improvement. \code{nmodels=0} does not save the best (500) models, however then posterior convergence and likelihood-based inference are not possible.
     %\code{beta.save=FALSE} saves the best models, but not their coefficents, which renders the use of \code{image.bma} and the paramer \code{exact=TRUE} in functions such as \code{coef.bma} infeasible.
     \code{g.stats=FALSE} saves some time by not retaining the shrinkage factors for the MC3 chain (and the best models). \code{force.fullobject=TRUE} in contrast, slows sampling down significantly if \code{mcmc="enumerate"}. 
     }
\section{Theoretical background}{ 
  The models analyzed are Bayesian normal-gamma conjugate models with improper constant and variance priors akin to Fernandez, Ley and Steel (2001): A model \eqn{M} can be described as follows, with \eqn{\epsilon} ~ \eqn{N(0,\sigma^2 I)}:
  \deqn{latex}{ y= \alpha + X \beta + \epsilon}
  \deqn{f(\beta | \sigma, M, g) ~ N(0, g \sigma^2 (X'X)^-1) }
  
  Moreover, the (improper) prior on the constant \eqn{f(\alpha)} is put proportional to 1. Similarly, the variance prior \eqn{f(\sigma)} is proportional to \eqn{1/\sigma}.
}
\seealso{ \code{\link{coef.bma}}, \code{\link{plotModelsize}} and \code{\link{density.bma}} for some operations on the resulting 'bma' object, \code{\link{c.bma}} for integrating separate MC3 chains and splitting of sampling over several runs.

Check \url{http://bms.zeugner.eu} for additional help.}
\examples{
  data(datafls)
  #estimating a standard MC3 chain with 1000 burn-ins and 2000 iterations and uniform model priors
  bma1 = bms(datafls,burn=1000, iter=2000, mprior="uniform")
  coef(bma1,exact=TRUE, std.coefs=TRUE) #standard coefficients based on exact likelihoods of the 100 best models
  
  #suppressing user-interactive output, using a customized starting value, and not saving the best models for only 19 observations (but 41 covariates)
  bma2 = bms(datafls[20:39,],burn=1000, iter=2000, nmodel=0, start.value=c(1,4,7,30),user.int=FALSE,)
  coef(bma2)
  
  #MC3 chain with a hyper-g prior (custom coefficient a=2.1), saving only the 20 best models, and an alternative sampling procedure; putting a log entry to console every 1000th step
  bma3 = bms(datafls,burn=1000, iter=5000, nmodel=20, g="hyper=2.1", mcmc="rev.jump",logfile="",logstep=1000)
  image(bma3) #showing the coefficient signs of the 20 best models
  
  #enumerating with 10 covariates (= 1024 models), keeping the shrinkage factors of the best 200 models
  bma4 = bms(datafls[,1:11],mcmc="enumerate",nmodel=200,g.stats=TRUE)

  #using an interaction sampler for two interaction terms
  dataint=datafls
  dataint=cbind(datafls,datafls$LifeExp*datafls$Abslat/1000,datafls$Protestants*datafls$Brit-datafls$Muslim)
  names(dataint)[ncol(dataint)-1]="LifeExp#Abslat"
  names(dataint)[ncol(dataint)]="Protestants#Brit#Muslim"
  bma5 = bms(X.data=dataint,burn=1000,iter=9000,start.value=0,mcmc="bd.int") 
  
  density(bma5,reg="English") # plot posterior density for covariate "English"
  
  # a matrix as X.data argument
  bms(matrix(rnorm(1000),100,10))
  
}
% Add one or more standard keywords, see file 'KEYWORDS' in the
% R documentation directory.
\keyword{models}
%\keyword{ ~kwd2 }% __ONLY ONE__ keyword per line
