\name{model.mapmake}
\alias{model.mapmake}


\title{ Map Making }
\description{
  Applies models to either ERDAS Imagine image (.img) files or ESRI Grids of predictors to create detailed prediction surfaces.  It will handle large predictor files for map making, by reading in the \code{.img} files in chuncks, and output to the \code{.txt} file the prediction for each data chunk, before reading the next chenk of data.
}
\usage{
model.mapmake(model.obj= NULL, folder = NULL, MODELfn = NULL, rastLUTfn = NULL, na.action = "na.omit", numrows = 500, map.sd = FALSE, asciifn = NULL, asciifn.mean = NULL, asciifn.stdev = NULL, asciifn.coefv = NULL, n.trees = NULL)
}

\arguments{

  \item{model.obj}{ \code{R} model object.  The model object to use for prediction, if the model has been previously created.  The model object must be of type RF or SGB.  (Eventually planned to include "GAM".)  If \code{NULL} (the default), a model is generated of type specified by the argument \code{model.type}.  }

  \item{folder}{ String.  The folder used for all output from predictions and/or maps.  Do not add ending slash to path string.  If \code{folder = NULL} (default), a GUI interface prompts user to browse to a folder.  To use the working directory, specify \code{folder = getwd()}.}

  \item{MODELfn}{ String.  The file name to use to save the generated model object.  If \code{MODELfn = NULL} (the default), a default name is generated by pasting \code{model.type_response.type_response.name}. If the other output filenames are left unspecified, \code{MODELfn} will be used as the basic name to generate other output filenames. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by \code{folder}.}

  \item{rastLUTfn}{ String.  The file name (full path or base name with path specified by \code{folder}) of a \code{.csv} file for a \code{rastLUT}. Alternativly, a dataframe containing the same information. The \code{rastLUT} must include 3 columns: (1) the full path and name of the raster file; (2) the shortname of each predictor / raster layer (band); (3) the layer (band) number.  The shortname (column 2) must match the names \code{predList}, the predictor column names in training/test data set (\code{qdata.trainfn} and \code{qdata.testfn}, and the predictor names in \code{model.obj}. 

Example of comma-delimited file:

\tabular{llllll}{
	  \tab \tab \tab \code{C:/button_test/tc99_2727subset.img,} \tab \code{tc99_2727subsetb1,} \tab \code{1}\cr
	  \tab \tab \tab \code{C:/button_test/tc99_2727subset.img,} \tab \code{tc99_2727subsetb2,} \tab \code{2}\cr
	  \tab \tab \tab \code{C:/button_test/tc99_2727subset.img,} \tab \code{tc99_2727subsetb3,} \tab \code{3}}}

  \item{na.action}{String.  Model validation.  Specifies the action to take if there are \code{NA} values in the prediction data or if there is a level or class of a ctegorical predictor variable in the validation test set or the production (mapping) data set, but not in the training data set.  There are 2 options: (1) \code{na.action = "na.omit"} (the default) where any data point or pixel with any new levels for any of the factored predictors is returned as \code{-9999} (the \code{NODATA} value); (2) \code{na.action = "na.roughfix"} where a missing categorical predictor for a data point or pixel is replaced with the most common category for that predictor, and a missing continuous predictor is replaced with the median for that predictor.   }

  \item{numrows}{ Integer.  Map Production.  The number of rows to be predicted at a time.}

  \item{map.sd}{ Logical.  Map Production.  If \code{map.sd = TRUE}, maps of mean, standard deviation, and coefficient of variation of the predictions from all the trees are generated for each pixel.  If \code{map.sd = FALSE} (the default), only the predicted probability map will be built. This option is only available if the \code{model.type = "RF"} the \code{response.type = "continuous"}.  Note: This option requires much more available memory. If you get the error \code{"..cannot allocate vector of size..."}, you must reduce the value of \code{numrow}.  

The names of the additional maps default to: 

\tabular{llll}{
	  \tab \tab \tab \code{folder/model.type_response.type_response.name_mean.txt}  \cr
	  \tab \tab \tab \code{folder/model.type_response.type_response.name_stdev.txt}  \cr
	  \tab \tab \tab \code{folder/model.type_response.type_response.name_coefv.txt} }
}

  \item{asciifn}{ String.  Map Production.  Filename of output file for map production.  The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by \code{folder}. If \code{asciifn = NULL} (the default), a name is created by pasting \code{modelfn} and \code{"_map.txt"}. }

  \item{asciifn.mean}{ String.  Map Production.  Used if \code{map.sd = TRUE} and \code{response.type = "continuous"}.  Filename of output file for mean of trees. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by \code{folder}. If \code{asciifn.mean = NULL} (the default), a name is created by pasting \code{modelfn} and \code{"_map_mean.txt"}.}

  \item{asciifn.stdev}{ String.  Map Production.  Used if \code{map.sd = TRUE} and \code{response.type = "continuous"}.  Filename of output file for standard deviation of trees. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by \code{folder}. If \code{asciifn.stdev = NULL} (the default), a name is created by pasting \code{modelfn} and \code{"_map_stdev.txt"}.}

  \item{asciifn.coefv}{ String.  Map Production.  Used if \code{map.sd = TRUE} and \code{response.type = "continuous"}.  Filename of output file for coefficient of variation of trees. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by \code{folder}. If \code{asciifn.coefv = NULL} (the default), a name is created by pasting \code{modelfn} and \code{"_map_coefv.txt"}.}

  \item{n.trees}{ Integer.  SGB models.  The number of stochastic gradient boosting trees for an SGB model. If \code{n.trees=NULL} (the default) the model creation code will increase the number of trees 100 at a time until OOB error rate stops improving. The \code{gbm} function \code{gbm.perf()} will be used to select from the total calculated trees, the best number of trees for model predictions, with argument \code{method="OOB"}. The \code{gbm} package warns that \code{OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive.} }

}
\details{

\code{model.mapmake()} can be run in a traditional R command mode, where all arguments are specified in the function call.  However it can also be used in a full push button mode, where you type in the simple command \code{model.mapmake()}, and GUI pop up windows will ask questions about the type of model, the file locations of the data, etc...

When running \code{model.mapmake()} on non-Windows platforms, file names and folders need to be specified in the argument list, but other pushbutton selections are handled by the \code{select.list()} function, which is platform independent. 

For map making, the package \code{rgdal} is used to read \code{.img} files. The data for production mapping should be in the form of pixel-based raster layers representing the predictors in the model. If there is more than one predictor or raster layer, the layers must all have the same number of columns and rows. The layers must also have the same extent, projection, and pixel size, for effective model development and accuracy. The layers must also be in either ESRI Grid or ERDAS Imagine image (single or multi-band) raster data formats, having continuous or categorical data values. The R package \code{rgdal} is used to read spatial rasters into R.

When creating maps of non-rectangular study regions there may be large portions of the rectangle where you have no predictors, and are unintrested in making predictions. The suggeted value for the pixels outside the study area is \code{-9999}. These pixels will be ignored in the predictions, thus saving computing time, and will be exported as \code{-9999}. Any value other than \code{-9999} will be treated as a legal data value and a prediction will be generated for each pixel. Note: in Imagine image files, if the specified \code{NODATA} is set as \code{-9999}, any \code{-9999} pixels will be read into R as \code{NA}, and if \code{na.action = "na.roughfix"}, predicitons will be attempted for these pixels. This will cause the computation time to increase, and these predictions will need to be masked out when the final map is imported back into a GIS sytem.

The function \code{model.mapmake()} outputs an ASCII grid file of map information suitable to be imported into a GIS. Small maps can also be imported back into R using the function \code{read.asciigrid()} from the \code{sp} package.
}
\value{

The function does not return a value,  instead it writes text files of map information (suitable for importing into a GIS) to the specified folder.

}
\references{ 
Breiman, L. (2001) Random Forests. Machine Learning, 45:5-32.

Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Stat., 29(5):1189-1232.

Friedman, J.H. (2002). Stochastic gradient boosting. Comput. Stat. Data An., 38(4):367-378.

Liaw, A. and  Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), 18--22.

Ridgeway, G., (1999). The state of boosting. Comp. Sci. Stat. 31:172-181
 }

\author{ Elizabeth Freeman and Tracey Frescino }
\note{ 
}
\seealso{ \code{\link{get.test}}, \code{\link{model.build}}, \code{\link{model.diagnostics}}}
\examples{

###########################################################################
############################# Run this set up code: #######################
###########################################################################

# set seed:
seed=38

# Define training and test files:

qdata.trainfn = system.file("external", "helpexamples","DATATRAIN.csv", package = "ModelMap")

# Define folder for all output:
folder=getwd()	


###########################################################################
############## Pick one of the following sets of definitions: #############
###########################################################################


########## Continuous Response, Continuous Predictors ############

#file name to store model:
MODELfn="RF_Bio_TC"				

#predictors:
predList=c("TCB","TCG","TCW")	

#define which predictors are categorical:
predFactor=FALSE	

# Response name and type:
response.name="BIO"
response.type="continuous"


########## binary Response, Continuous Predictors ############

#file name to store model:
MODELfn="RF_CONIFTYP_TC"				

#predictors:
predList=c("TCB","TCG","TCW")		

#define which predictors are categorical:
predFactor=FALSE

# Response name and type:
response.name="CONIFTYP"

# This variable is 1 if a conifer or mixed conifer type is present, 
# otherwise 0.

response.type="binary"


########## Continuous Response, Categorical Predictors ############

# In this example, NLCD is a categorical predictor.
#
# You must decide what you want to happen if there are categories
# present in the data to be predicted (either the validation/test set
# or in the image file) that were not present in the original training data.
# Choices:
#       na.action = "na.omit"
#                    Any validation datapoint or image pixel with a value for any
#                    categorical predictor not found in the training data will be
#                    returned as NA.
#       na.action = "na.roughfix"
#                    Any validation datapoint or image pixel with a value for any
#                    categorical predictor not found in the training data will have
#                    the most common category for that predictor substituted,
#                    and the a prediction will be made.

# You must also let R know which of the predictors are categorical, in other
# words, which ones R needs to treat as factors.
# This vector must be a subset of the predictors given in predList

#file name to store model:
MODELfn="RF_BIO_TCandNLCD"			

#predictors:
predList=c("TCB","TCG","TCW","NLCD")

#define which predictors are categorical:
predFactor=c("NLCD")

# Response name and type:
response.name="BIO"
response.type="continuous"



###########################################################################
########################### build model: ##################################
###########################################################################


### create model ###

model.obj = model.build( model.type="RF",
                       qdata.trainfn=qdata.trainfn,
                       folder=folder,		
                       MODELfn=MODELfn,
                       predList=predList,
                       predFactor=predFactor,
                       response.name=response.name,
                       response.type=response.type,
                       seed=seed
)



###########################################################################
############ Then Run this code to predict map pixels #####################
###########################################################################

# A single model was built from the training data, 
# but it will be applied to two sets of image data, one from 2001 and one from 2004

####################################################################################################

### Create a list of the filenames (including paths) for the rast Look up Tables ###


rastLUTfn.2001 <- paste(system.file(package="ModelMap"),"/external/helpexamples/LUT_2001.csv",sep="")
rastLUTfn.2004 <- paste(system.file(package="ModelMap"),"/external/helpexamples/LUT_2004.csv",sep="")


### Load rast LUT tables, and add path to the filenames in column 1 ###

rastLUT.2001 <- read.table(rastLUTfn.2001,header=FALSE,sep=",",stringsAsFactors=FALSE)
rastLUT.2004 <- read.table(rastLUTfn.2004,header=FALSE,sep=",",stringsAsFactors=FALSE)

rastLUT.2001[,1] <- paste(system.file(package="ModelMap"),"external/helpexamples",rastLUT.2001[,1],sep="/")
rastLUT.2004[,1] <- paste(system.file(package="ModelMap"),"external/helpexamples",rastLUT.2004[,1],sep="/")                                      


### Define filenames for map  output ###

asciifn.2001 <- "RF_BIO_TCandNLCD_01.txt"
asciifn.2004 <- "RF_BIO_TCandNLCD_04.txt"


asciifn.2001 <- paste(folder,asciifn.2001,sep="/")
asciifn.2004 <- paste(folder,asciifn.2004,sep="/")


### Define Number of rows of raster to read in at one time ###
# if crashes with warning: "unable to assign..." lower this number

numrows=500


### Create ascii text files of predicted map data ###

model.mapmake( model.obj=model.obj,
               folder=folder,		
               rastLUTfn=rastLUT.2001,
           # Model Validation Arguments	
               na.action="na.roughfix",
           # Mapping arguments
               numrows = numrows,						
               asciifn=asciifn.2001
               )

model.mapmake( model.obj=model.obj,
               folder=folder,		
               rastLUTfn=rastLUT.2004,
           # Model Validation Arguments	
               na.action="na.roughfix",
           # Mapping arguments
               numrows = numrows,						
               asciifn=asciifn.2004
               )

###########################################################################
######### run this code to create maps in R (For small maps only!)#########
###########################################################################

### Define Color Ramp ###

l <- seq(100,0,length.out=101)
c <- seq(0,100,length.out=101)
col.ramp <- hcl(h = 120, c = c, l = l)


### read in map data ###

mapgrid.2001 <- read.asciigrid(asciifn.2001,as.image=TRUE)
mapgrid.2004 <- read.asciigrid(asciifn.2004,as.image=TRUE)


### create map ###

dev.new(width = 8, height = 4)
opar <- par(mfrow=c(1,2),mar=c(3,3,2,1),oma=c(0,0,3,4),xpd=NA)

zlim <- c(0,max(mapgrid.2001$z,mapgrid.2004$z,na.rm=TRUE))
legend.label<-rev(pretty(zlim,n=5))
legend.colors<-col.ramp[trunc((legend.label/max(legend.label))*100)+1]

image(mapgrid.2001, col = col.ramp,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext("2001 Imagery",side=3,line=1,cex=1.2)

image(mapgrid.2004, col = col.ramp,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext("2004 Imagery",side=3,line=1,cex=1.2)

legend(	x=max(mapgrid.2004$x),y=max(mapgrid.2004$y),
	legend=legend.label,
	fill=legend.colors,
	bty="n",
	cex=1.2
)
mtext("Predictions",side=3,line=1,cex=1.5,outer=TRUE)
par(opar)


###########################################################################
##### Run this code to map predictor data in R (For small maps only!) #####
###########################################################################

### Define Color Ramps ###

l <- seq(100,0,length.out=101)
c <- seq(0,100,length.out=101)
col.ramp.1 <- hcl(h = 15, c = c, l = l)
col.ramp.2 <- hcl(h = 70, c = c, l = l)
col.ramp.3 <- hcl(h = 150, c = c, l = l)


dev.new(width = 9, height = 6)
opar <- par(mfcol=c(2,3),mar=c(3,3,2,1),oma=c(0,0,3,4),xpd=NA)

#band 1
predgrid.2001=readGDAL(rastLUT.2001[1,1],band=rastLUT.2001[1,3])
predgrid.2001=as.image.SpatialGridDataFrame(predgrid.2001)
predgrid.2004=readGDAL(rastLUT.2004[1,1],band=rastLUT.2004[1,3])
predgrid.2004=as.image.SpatialGridDataFrame(predgrid.2004)

zlim <- range(predgrid.2001$z,predgrid.2004$z)

image(predgrid.2001, col = col.ramp.1,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext(rastLUT.2001[1,2],side=3,cex=1.5)
mtext("2001 Imagery",side=2,cex=1.5,line=1)

image(predgrid.2004, col = col.ramp.1,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext("2004 Imagery",side=2,cex=1.5,line=1)


#band 2
predgrid.2001=readGDAL(rastLUT.2001[2,1],band=rastLUT.2001[2,3])
predgrid.2001=as.image.SpatialGridDataFrame(predgrid.2001)
predgrid.2004=readGDAL(rastLUT.2004[2,1],band=rastLUT.2004[2,3])
predgrid.2004=as.image.SpatialGridDataFrame(predgrid.2004)

zlim <- range(predgrid.2001$z,predgrid.2004$z)

image(predgrid.2001, col = col.ramp.2,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext(rastLUT.2001[2,2],side=3,cex=1.5)

image(predgrid.2004, col = col.ramp.2,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")


#band 3
predgrid.2001=readGDAL(rastLUT.2001[3,1],band=rastLUT.2001[3,3])
predgrid.2001=as.image.SpatialGridDataFrame(predgrid.2001)
predgrid.2004=readGDAL(rastLUT.2004[3,1],band=rastLUT.2004[3,3])
predgrid.2004=as.image.SpatialGridDataFrame(predgrid.2004)

zlim <- range(predgrid.2001$z,predgrid.2004$z)

image(predgrid.2001, col = col.ramp.3,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")
mtext(rastLUT.2001[3,2],side=3,cex=1.5)

image(predgrid.2004, col = col.ramp.3,zlim=zlim,asp=1,bty="n",xaxt="n",yaxt="n")


mtext("Predictor Imagery",side=3,line=1,cex=1.5,outer=TRUE)
par(opar)






}

\keyword{ models }

