pwbart {BART} | R Documentation |
BART is a Bayesian “sum-of-trees” model.
For a numeric response y, we have
y = f(x) + e,
where e ~ N(0,sigma^2).
f is the sum of many tree models. The goal is to have very flexible inference for the uknown function f.
In the spirit of “ensemble models”, each tree is constrained by a prior to be a weak learner so that it contributes a small amount to the overall fit.
pwbart( x.test, treedraws, mu=0, mc.cores=1L, transposed=FALSE, dodraws=TRUE, nice=19L ## mc.pwbart only ) mc.pwbart( x.test, treedraws, mu=0, mc.cores=2L, transposed=FALSE, dodraws=TRUE, nice=19L ## mc.pwbart only )
x.test |
Matrix of covariates to predict y for. |
treedraws |
|
mu |
Mean to add on to y prediction. |
mc.cores |
Number of threads to utilize. |
transposed |
When running |
dodraws |
Whether to return the draws themselves (the default), or whether to
return the mean of the draws as specified by |
nice |
Set the job niceness. The default niceness is 19: niceness goes from 0 (highest) to 19 (lowest). |
BART is an Bayesian MCMC method. At each MCMC interation, we produce a draw from the joint posterior (f,sigma) \| (x,y) in the numeric y case and just f in the binary y case.
Thus, unlike a lot of other modelling methods in R, we do not produce a single model object from which fits and summaries may be extracted. The output consists of values f*(x) (and sigma* in the numeric case) where * denotes a particular draw. The x is either a row from the training data (x.train) or the test data (x.test).
Returns a matrix of predictions corresponding to x.test
.
Robert McCulloch: robert.e.mcculloch@gmail.com,
Rodney Sparapani: rsparapa@mcw.edu.
Chipman, H., George, E., and McCulloch R. (2010) Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4,1, 266-298 <doi:10.1214/09-AOAS285>.
Chipman, H., George, E., and McCulloch R. (2006) Bayesian Ensemble Learning. Advances in Neural Information Processing Systems 19, Scholkopf, Platt and Hoffman, Eds., MIT Press, Cambridge, MA, 265-272.
Friedman, J.H. (1991) Multivariate adaptive regression splines. The Annals of Statistics, 19, 1–67.
##simulate data (example from Friedman MARS paper) f = function(x){ 10*sin(pi*x[,1]*x[,2]) + 20*(x[,3]-.5)^2+10*x[,4]+5*x[,5] } sigma = 1.0 #y = f(x) + sigma*z , z~N(0,1) n = 100 #number of observations set.seed(99) x=matrix(runif(n*10),n,10) #10 variables, only first 5 matter y=f(x) ##test BART with token run to ensure installation works set.seed(99) post = wbart(x,y,nskip=5,ndpost=5) x.test = matrix(runif(500*10),500,10) ## Not run: ##run BART set.seed(99) post = wbart(x,y) x.test = matrix(runif(500*10),500,10) pred = pwbart(post$treedraws, x.test, mu=mean(y)) plot(apply(pred, 2, mean), f(x.test)) ## End(Not run)