abart {BART} | R Documentation |
BART is a Bayesian “sum-of-trees” model.
For a numeric response y, we have
y = f(x) + e,
where e ~ N(0,sigma^2).
f is the sum of many tree models. The goal is to have very flexible inference for the uknown function f.
In the spirit of “ensemble models”, each tree is constrained by a prior to be a weak learner so that it contributes a small amount to the overall fit.
abart( x.train, times, delta, x.test=matrix(0,0,0), K=100, type='abart', ntype=1, sparse=FALSE, theta=0, omega=1, a=0.5, b=1, augment=FALSE, rho=NULL, xinfo=matrix(0,0,0), usequants=FALSE, rm.const=TRUE, sigest=NA, sigdf=3, sigquant=0.90, k=2, power=2, base=0.95, lambda=NA, tau.num=c(NA, 3, 6)[ntype], offset=NULL, w=rep(1, length(times)), ntree=c(200L, 50L, 50L)[ntype], numcut=100L, ndpost=1000L, nskip=100L, keepevery=c(1L, 10L, 10L)[ntype], printevery=100L, transposed=FALSE, mc.cores = 1L, ## mc.abart only nice = 19L, ## mc.abart only seed = 99L ## mc.abart only ) mc.abart( x.train, times, delta, x.test=matrix(0,0,0), K=100, type='abart', ntype=1, sparse=FALSE, theta=0, omega=1, a=0.5, b=1, augment=FALSE, rho=NULL, xinfo=matrix(0,0,0), usequants=FALSE, rm.const=TRUE, sigest=NA, sigdf=3, sigquant=0.90, k=2, power=2, base=0.95, lambda=NA, tau.num=c(NA, 3, 6)[ntype], offset=NULL, w=rep(1, length(times)), ntree=c(200L, 50L, 50L)[ntype], numcut=100L, ndpost=1000L, nskip=100L, keepevery=c(1L, 10L, 10L)[ntype], printevery=100L, transposed=FALSE, mc.cores = 2L, nice = 19L, seed = 99L )
x.train |
Explanatory variables for training (in sample)
data. |
times |
The time of event or right-censoring. |
delta |
The event indicator: 1 is an event while 0 is censored. |
x.test |
Explanatory variables for test (out of sample)
data. Should have same structure as |
K |
If provided, then coarsen |
type |
You can use this argument to specify the type of fit.
|
ntype |
The integer equivalent of |
sparse |
Whether to perform variable selection based on a sparse Dirichlet prior rather than simply uniform; see Linero 2016. |
theta |
Set theta parameter; zero means random. |
omega |
Set omega parameter; zero means random. |
a |
Sparse parameter for Beta(a, b) prior: 0.5<=a<=1 where lower values inducing more sparsity. |
b |
Sparse parameter for Beta(a, b) prior; typically, b=1. |
rho |
Sparse parameter: typically rho=p where p is the number of covariates under consideration. |
augment |
Whether data augmentation is to be performed in sparse variable selection. |
xinfo |
You can provide the cutpoints to BART or let BART
choose them for you. To provide them, use the |
usequants |
If |
rm.const |
Whether or not to remove constant variables. |
sigest |
The prior for the error variance
(sigma\^2) is inverted chi-squared (the standard
conditionally conjugate prior). The prior is specified by choosing
the degrees of freedom, a rough estimate of the corresponding
standard deviation and a quantile to put this rough estimate at. If
|
sigdf |
Degrees of freedom for error variance prior. Not used if y is binary. |
sigquant |
The quantile of the prior that the rough estimate
(see |
k |
For numeric y, |
power |
Power parameter for tree prior. |
base |
Base parameter for tree prior. |
lambda |
The scale of the prior for the variance. Not used if y is binary. |
tau.num |
The numerator in the |
offset |
Continous BART operates on |
w |
Vector of weights which multiply the standard deviation. Not used if y is binary. |
ntree |
The number of trees in the sum. |
numcut |
The number of possible values of c (see
|
ndpost |
The number of posterior draws returned. |
nskip |
Number of MCMC iterations to be treated as burn in. |
printevery |
As the MCMC runs, a message is printed every printevery draws. |
keepevery |
Every keepevery draw is kept to be returned to the user. |
transposed |
When running |
seed |
Setting the seed required for reproducible MCMC. |
mc.cores |
Number of cores to employ in parallel. |
nice |
Set the job niceness. The default niceness is 19: niceness goes from 0 (highest) to 19 (lowest). |
BART is a Bayesian MCMC method. At each MCMC interation, we produce a draw from the joint posterior (f,sigma) \| (x,y) in the numeric y case and just f in the binary y case.
Thus, unlike a lot of other modelling methods in R, we do not produce
a single model object from which fits and summaries may be extracted.
The output consists of values f*(x) (and
sigma* in the numeric case) where * denotes a
particular draw. The x is either a row from the training data,
x.train
or the test data, x.test
.
abart
returns an object of type abart
which is
essentially a list.
In the numeric y case, the list has components:
yhat.train |
A matrix with ndpost rows and nrow(x.train) columns.
Each row corresponds to a draw f* from the posterior of f
and each column corresponds to a row of x.train.
The (i,j) value is f*(x) for the i\^th kept draw of f
and the j\^th row of x.train. |
yhat.test |
Same as yhat.train but now the x's are the rows of the test data. |
yhat.train.mean |
train data fits = mean of yhat.train columns. |
yhat.test.mean |
test data fits = mean of yhat.test columns. |
sigma |
post burn in draws of sigma, length = ndpost. |
first.sigma |
burn-in draws of sigma. |
varcount |
a matrix with ndpost rows and nrow(x.train) columns. Each row is for a draw. For each variable (corresponding to the columns), the total count of the number of times that variable is used in a tree decision rule (over all trees) is given. |
sigest |
The rough error standard deviation (sigma) used in the prior. |
Robert McCulloch: robert.e.mcculloch@gmail.com,
Rodney Sparapani: rsparapa@mcw.edu.
Chipman, H., George, E., and McCulloch R. (2010) Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4,1, 266-298 <doi:10.1214/09-AOAS285>.
Chipman, H., George, E., and McCulloch R. (2006) Bayesian Ensemble Learning. Advances in Neural Information Processing Systems 19, Scholkopf, Platt and Hoffman, Eds., MIT Press, Cambridge, MA, 265-272.
Friedman, J.H. (1991) Multivariate adaptive regression splines. The Annals of Statistics, 19, 1–67.
Linero, A.R. (2018) Bayesian regression trees for high dimensional prediction and variable selection. JASA, 113, 626–36.
N = 1000 P = 5 #number of covariates M = 8 set.seed(12) x.train=matrix(runif(N*P, -2, 2), N, P) mu = x.train[ , 1]^3 y=rnorm(N, mu) offset=mean(y) T=exp(y) C=rexp(N, 0.05) delta=(T<C)*1 table(delta)/N times=(T*delta+C*(1-delta)) ##test BART with token run to ensure installation works set.seed(99) post1 = abart(x.train, times, delta, nskip=5, ndpost=10) ## Not run: post1 = mc.abart(x.train, times, delta, mc.cores=M, seed=99) post2 = mc.abart(x.train, times, delta, offset=offset, mc.cores=M, seed=99) Z=8 plot(mu, post1$yhat.train.mean, asp=1, xlim=c(-Z, Z), ylim=c(-Z, Z)) abline(a=0, b=1) plot(mu, post2$yhat.train.mean, asp=1, xlim=c(-Z, Z), ylim=c(-Z, Z)) abline(a=0, b=1) plot(post1$yhat.train.mean, post2$yhat.train.mean, asp=1, xlim=c(-Z, Z), ylim=c(-Z, Z)) abline(a=0, b=1) ## End(Not run)