mc.surv.pwbart {BART} | R Documentation |
BART is a Bayesian “sum-of-trees” model.
For a numeric response y, we have
y = f(x) + e,
where e ~ N(0,sigma^2).
f is the sum of many tree models. The goal is to have very flexible inference for the uknown function f.
In the spirit of “ensemble models”, each tree is constrained by a prior to be a weak learner so that it contributes a small amount to the overall fit.
surv.pwbart( x.test, treedraws, binaryOffset=0, mc.cores=1L, type='pbart', transposed=FALSE, nice=19L ) mc.surv.pwbart( x.test, treedraws, binaryOffset=0, mc.cores=2L, type='pbart', transposed=FALSE, nice=19L ) mc.recur.pwbart( x.test, treedraws, binaryOffset=0, mc.cores=2L, type='pbart', transposed=FALSE, nice=19L )
x.test |
Matrix of covariates to predict y for. |
binaryOffset |
Mean to add on to y prediction. |
treedraws |
|
mc.cores |
Number of threads to utilize. |
type |
Whether to employ Albert-Chib, |
transposed |
When running |
nice |
Set the job niceness. The default niceness is 19: niceness goes from 0 (highest) to 19 (lowest). |
BART is an Bayesian MCMC method. At each MCMC interation, we produce a draw from the joint posterior (f,sigma) \| (x,y) in the numeric y case and just f in the binary y case.
Thus, unlike a lot of other modelling methods in R, we do not produce a single model object from which fits and summaries may be extracted. The output consists of values f*(x) (and sigma* in the numeric case) where * denotes a particular draw. The x is either a row from the training data (x.train) or the test data (x.test).
Returns an object of type survbart
which is essentially a list with components:
yhat.test |
A matrix with ndpost rows and nrow(x.test) columns.
Each row corresponds to a draw f* from the posterior of f
and each column corresponds to a row of x.train.
The (i,j) value is f*(x) for the i\^th kept draw of f
and the j\^th row of x.train. |
surv.test |
test data fits for survival probability: not
available for |
surv.test.mean |
mean of |
haz.test |
test data fits for hazard: available for
|
haz.test.mean |
mean of |
cum.test |
test data fits for cumulative hazard: available for
|
cum.test.mean |
mean of |
Robert McCulloch: robert.e.mcculloch@gmail.com,
Rodney Sparapani: rsparapa@mcw.edu.
Sparapani, R., Logan, B., McCulloch, R., and Laud, P. (2016) Nonparametric survival analysis using Bayesian Additive Regression Trees (BART). Statistics in Medicine, 16:2741-53 <doi:10.1002/sim.6893>.
## load the advanced lung cancer example data(lung) group <- -which(is.na(lung[ , 7])) ## remove missing row for ph.karno times <- lung[group, 2] ##lung$time delta <- lung[group, 3]-1 ##lung$status: 1=censored, 2=dead ##delta: 0=censored, 1=dead ## this study reports time in days rather than months like other studies ## coarsening from days to months will reduce the computational burden times <- ceiling(times/30) summary(times) table(delta) x.train <- as.matrix(lung[group, c(4, 5, 7)]) ## matrix of observed covariates ## lung$age: Age in years ## lung$sex: Male=1 Female=2 ## lung$ph.karno: Karnofsky performance score (dead=0:normal=100:by=10) ## rated by physician dimnames(x.train)[[2]] <- c('age(yr)', 'M(1):F(2)', 'ph.karno(0:100:10)') summary(x.train[ , 1]) table(x.train[ , 2]) table(x.train[ , 3]) x.test <- matrix(nrow=84, ncol=3) ## matrix of covariate scenarios dimnames(x.test)[[2]] <- dimnames(x.train)[[2]] i <- 1 for(age in 5*(9:15)) for(sex in 1:2) for(ph.karno in 10*(5:10)) { x.test[i, ] <- c(age, sex, ph.karno) i <- i+1 } ## this x.test is relatively small, but often you will want to ## predict for a large x.test matrix which may cause problems ## due to consumption of RAM so we can predict separately ## mcparallel/mccollect do not exist on windows if(.Platform$OS.type=='unix') { ##test BART with token run to ensure installation works set.seed(99) post <- surv.bart(x.train=x.train, times=times, delta=delta, nskip=5, ndpost=5, keepevery=1) pre <- surv.pre.bart(x.train=x.train, times=times, delta=delta, x.test=x.test) pred <- mc.surv.pwbart(pre$tx.test, post$treedraws, post$binaryOffset) } ## Not run: ## run one long MCMC chain in one process set.seed(99) post <- surv.bart(x.train=x.train, times=times, delta=delta) ## run "mc.cores" number of shorter MCMC chains in parallel processes ## post <- mc.surv.bart(x.train=x.train, times=times, delta=delta, ## mc.cores=8, seed=99) pre <- surv.pre.bart(x.train=x.train, times=times, delta=delta, x.test=x.test) pred <- surv.pwbart(pre$tx.test, post$treedraws, post$binaryOffset) ## let's look at some survival curves ## first, a younger group with a healthier KPS ## age 50 with KPS=90: males and females ## males: row 17, females: row 23 x.test[c(17, 23), ] low.risk.males <- 16*post$K+1:post$K ## K=unique times including censoring low.risk.females <- 22*post$K+1:post$K plot(post$times, pred$surv.test.mean[low.risk.males], type='s', col='blue', main='Age 50 with KPS=90', xlab='t', ylab='S(t)', ylim=c(0, 1)) points(post$times, pred$surv.test.mean[low.risk.females], type='s', col='red') ## End(Not run)