bkpc {BKPC} | R Documentation |
Function bkpc
is used to train a Bayesian kernel projection classifier. This is a nonlinear multicategory classifier which performs the classification of the projections of the data to the principal axes of the feature space. The Gibbs sampler is implemented to find the posterior distributions of the parameters, so probability distributions of prediction can be obtained for for new observations.
## Default S3 method: bkpc(x, y, theta = NULL, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL, initTau = NULL, intercept = TRUE, rotate = TRUE, ...) ## S3 method for class 'kern' bkpc(x, y, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL, initTau = NULL, intercept = TRUE, rotate = TRUE, ...) ## S3 method for class 'kernelMatrix' bkpc(x, y, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL, initTau = NULL, intercept = TRUE, rotate = TRUE, ...)
x |
either: a data matrix, a kernel matrix of class |
y |
a response vector with one label for each row of |
theta |
the inverse kernel bandwidth parameter. |
n.kpc |
number of kernel principal components to use. |
n.iter |
number of iterations for the MCMC algorithm. |
thin |
thinning interval. |
std |
standard deviation parameter for the random walk proposal. |
g1 |
γ_1 hyper-parameter of the prior inverse gamma distribution for the σ^2 parameter in the BKPC model. |
g2 |
γ_2 hyper-parameter of the prior inverse gamma distribution for the σ^2 parameter of the BKPC model. |
g3 |
γ_3 hyper-parameter of the prior gamma distribution for the τ parameter in the BKPC model. |
g4 |
γ_4 hyper-parameter of the prior gamma distribution for the τ parameter in the BKPC model. |
initSigmasq |
optional specification of initial value for the σ^2 parameter in the BKPC model. |
initBeta |
optional specification of initial values for the β parameters in the BKPC model. |
initTau |
optional specification of initial values for the τ parameters in the BKPC model. |
intercept |
if |
rotate |
if |
... |
Currently not used. |
Initial values for a BKPC model can be supplied, otherwise they are generated using runif
function.
The data can be passed to the bkpc
function in a matrix and the Gaussian kernel computed using the gaussKern
function is then used in training the algorithm and predicting. The bandwidth parameter theta
can be supplied to the gaussKern
function, else a default value is used.
In addition, bkpc
also supports input in the form of a kernel matrix of class "kern"
or "kernelMatrix"
.The latter allows for a range of kernel functions as well as user specified ones.
If rotate=TRUE
(the default) then the BKPC is trained. This algorithm performs the classification of the projections of the data to the principal axes of the feature space. Else the Bayesian kernel multicategory classifier (BKMC) is trained, where the data is mapped to the feature space via the kernel matrix, but not projected (rotated) to the principal axes. The hierarchichal prior structure for the two models is the same, but BKMC model is not sparse.
An object of class "bkpc"
including:
beta |
realizations of the β parameters from the joint posterior distribution in the BKPC model. |
tau |
realizations of the τ parameters from the joint posterior distribution in the BKPC model. |
z |
realizations of the latent variables z from the joint posterior distribution in the BKPC model. |
sigmasq |
realizations of the σ^2 parameter from the joint posterior distribution in the BKPC model. |
n.class |
number of independent classes of the response variable i.e. number of classes - 1. |
n.kpc |
number of kernel principal components used. |
n.iter |
number of iterations of the MCMC algorithm. |
thin |
thinning interval. |
intercept |
if true, intercept was included in the model. |
rotate |
if true, the sparse BKPC model was fitted, else BKMC model. |
kPCA |
if |
x |
the supplied data matrix or kernel matrix. |
theta |
if data was supplied, as opposed to the kernel, this is the inverse kernel bandwidth parameter used in obtaining the Gaussian kernel, else |
If supplied, data are not scaled internally. If rotate=TRUE
the mapping is centered internally by the kPCA
function.
K. Domijan
Domijan K. and Wilson S. P.: Bayesian kernel projections for classification of high dimensional data. Statistics and Computing, 2011, Volume 21, Issue 2, pp 203-216
kPCA
gaussKern
predict.bkpc
plot.bkpc
summary.bkpc
kernelMatrix
(in package kernlab)
set.seed(-88106935) data(microarray) # consider only four tumour classes (NOTE: "NORM" is not a class of tumour) y <- microarray[, 2309] train <- as.matrix(microarray[y != "NORM", -2309]) wtr <- factor(microarray[y != "NORM", 2309], levels = c("BL" , "EWS" , "NB" ,"RMS" )) n.kpc <- 6 n.class <- length(levels(wtr)) - 1 K <- gaussKern(train)$K # supply starting values for the parameters # use Gaussian kernel as input result <- bkpc(K, y = wtr, n.iter = 1000, thin = 10, n.kpc = n.kpc, initSigmasq = 0.001, initBeta = matrix(10, n.kpc *n.class, 1), initTau =matrix(10, n.kpc * n.class, 1), intercept = FALSE, rotate = TRUE) # predict out <- predict(result, n.burnin = 10) table(out$class, as.numeric(wtr)) # plot the data projection on the kernel principal components pairs(result$kPCA$KPCs[, 1 : n.kpc], col = as.numeric(wtr), main = paste("symbol = predicted class", "\n", "color = true class" ), pch = out$class, upper.panel = NULL) par(xpd=TRUE) legend('topright', levels(wtr), pch = unique(out$class), text.col = as.numeric(unique(wtr)), bty = "n") # Another example: Iris data data(iris) testset <- sample(1:150,50) train <- as.matrix(iris[-testset,-5]) test <- as.matrix(iris[testset,-5]) wtr <- iris[-testset, 5] wte <- iris[testset, 5] # use default starting values for paramteres in the model. result <- bkpc(train, y = wtr, n.iter = 1000, thin = 10, n.kpc = 2, intercept = FALSE, rotate = TRUE) # predict out <- predict(result, test, n.burnin = 10) # classification rate sum(out$class == as.numeric(wte))/dim(test)[1] table(out$class, as.numeric(wte)) ## Not run: # Another example: synthetic data from MASS library library(MASS) train<- as.matrix(synth.tr[, -3]) test<- as.matrix(synth.te[, -3]) wtr <- as.factor(synth.tr[, 3]) wte <- as.factor(synth.te[, 3]) # make training set kernel using kernelMatrix from kernlab library library(kernlab) kfunc <- laplacedot(sigma = 1) Ktrain <- kernelMatrix(kfunc, train) # make testing set kernel using kernelMatrix {kernlab} Ktest <- kernelMatrix(kfunc, test, train) result <- bkpc(Ktrain, y = wtr, n.iter = 1000, thin = 10, n.kpc = 3, intercept = FALSE, rotate = TRUE) # predict out <- predict(result, Ktest, n.burnin = 10) # classification rate sum(out$class == as.numeric(wte))/dim(test)[1] table(out$class, as.numeric(wte)) # embed data from the testing set on the new space: KPCtest <- predict(result$kPCA, Ktest) # new data is linearly separable in the new feature space where classification takes place library(rgl) plot3d(KPCtest[ , 1 : 3], col = as.numeric(wte)) # another model: do not project the data to the principal axes of the feature space. # NOTE: Slow # use Gaussian kernel with the default bandwidth parameter Ktrain <- gaussKern(train)$K Ktest <- gaussKern(train, test, theta = gaussKern(train)$theta)$K resultBKMC <- bkpc(Ktrain, y = wtr, n.iter = 1000, thin = 10, intercept = FALSE, rotate = FALSE) # predict outBKMC <- predict(resultBKMC, Ktest, n.burnin = 10) # to compare with previous model table(outBKMC$class, as.numeric(wte)) # another example: wine data from gclus library library(gclus) data(wine) testset <- sample(1 : 178, 90) train <- as.matrix(wine[-testset, -1]) test <- as.matrix(wine[testset, -1]) wtr <- as.factor(wine[-testset, 1]) wte <- as.factor(wine[testset, 1]) # make training set kernel using kernelMatrix from kernlab library kfunc <- anovadot(sigma = 1, degree = 1) Ktrain <- kernelMatrix(kfunc, train) # make testing set kernel using kernelMatrix {kernlab} Ktest <- kernelMatrix(kfunc, test, train) result <- bkpc(Ktrain, y = wtr, n.iter = 1000, thin = 10, n.kpc = 3, intercept = FALSE, rotate = TRUE) out <- predict(result, Ktest, n.burnin = 10) # classification rate in the test set sum(out$class == as.numeric(wte))/dim(test)[1] # embed data from the testing set on the new space: KPCtest <- predict(result$kPCA, Ktest) # new data is linearly separable in the new feature space where classification takes place pairs(KPCtest[ , 1 : 3], col = as.numeric(wte), main = paste("symbol = predicted class", "\n", "color = true class" ), pch = out$class, upper.panel = NULL) par(xpd=TRUE) legend('topright', levels(wte), pch = unique(out$class), text.col = as.numeric(unique(wte)), bty = "n") ## End(Not run)