flamCV {flam} | R Documentation |
Fit an additive model where each component is estimated to piecewise constant with a small number of adaptively-chosen knots. Tuning parameter selection is done using K-fold cross-validation. In particular, this function implements the "fused lasso additive model", as proposed in Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.
flamCV(x, y, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL, alpha = 1, family = "gaussian", method = "BCD", fold = NULL, n.fold = NULL, seed = NULL, within1SE = T, tolerance = 10e-6)
x |
n x p covariate matrix. May have p > n. |
y |
n-vector containing the outcomes for the n observations in |
lambda.min.ratio |
smallest value for |
n.lambda |
the number of lambda values to consider - the default is 50. |
lambda.seq |
a user-supplied sequence of positive lambda values to consider. The typical usage is to calculate |
alpha |
the value of the tuning parameter alpha to consider - default is 1. Value must be in [0,1] with values near 0 prioritizing sparsity of functions and values near 1 prioritizing limiting the number of knots. Empirical evidence suggests using alpha of 1 when p < n and alpha of 0.75 when p > n. |
family |
specifies the loss function to use. Currently supports squared error loss (default; |
method |
specifies the optimization algorithm to use. Options are block-coordinate descent (default; |
fold |
user-supplied fold numbers for cross-validation. If supplied, |
n.fold |
the number of folds, K, to use for the K-fold cross-validation selection of tuning parameters. The default is 10 - specification of |
seed |
an optional number used with |
within1SE |
logical ( |
tolerance |
specifies the convergence criterion for the objective (default is 10e-6). |
Note that flamCV
does not cross-validate over alpha
- just a single value should be provided. However, if the user would like to cross-validate over alpha
, then flamCV
should be called multiple times for different values of alpha
and the same seed
. This ensures that the cross-validation folds (fold
) remain the same for the different values of alpha
. See the example below for details.
An object with S3 class "flamCV".
mean.cv.error |
m-vector containing cross-validation error where m is the length of |
se.cv.error |
m-vector containing cross-validation standard error where m is the length of |
lambda.cv |
optimal lambda value chosen by cross-validation. |
alpha |
as specified by user (or default). |
index.cv |
index of the model corresponding to 'lambda.cv'. |
flam.out |
object of class 'flam' returned by |
fold |
as specified by user (or default). |
n.folds |
as specified by user (or default). |
within1SE |
as specified by user (or default). |
tolerance |
as specified by user (or default). |
call |
matched call. |
Ashley Petersen
Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.
flam
, plot.flamCV
, summary.flamCV
#See ?'flam-package' for a full example of how to use this package #generate data set.seed(1) data <- sim.data(n = 50, scenario = 1, zerof = 10, noise = 1) #fit model for a range of lambda chosen by default #pick lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice flamCV.out <- flamCV(x = data$x, y = data$y, alpha = 0.75, n.fold = 2) ## Not run: #note that cross-validation is only done to choose lambda for specified alpha #to cross-validate over alpha also, call 'flamCV' for several alpha and set seed #note: use larger 'n.fold' (e.g., 10) in practice flamCV.out1 <- flamCV(x = data$x, y = data$y, alpha = 0.65, seed = 100, within1SE = FALSE, n.fold = 2) flamCV.out2 <- flamCV(x = data$x, y = data$y, alpha = 0.75, seed = 100, within1SE = FALSE, n.fold = 2) flamCV.out3 <- flamCV(x = data$x, y = data$y, alpha = 0.85, seed = 100, within1SE = FALSE, n.fold = 2) #this ensures that the folds used are the same flamCV.out1$fold; flamCV.out2$fold; flamCV.out3$fold #compare the CV error for the optimum lambda of each alpha to choose alpha CVerrors <- c(flamCV.out1$mean.cv.error[flamCV.out1$index.cv], flamCV.out2$mean.cv.error[flamCV.out2$index.cv], flamCV.out3$mean.cv.error[flamCV.out3$index.cv]) best.alpha <- c(flamCV.out1$alpha, flamCV.out2$alpha, flamCV.out3$alpha)[which(CVerrors==min(CVerrors))] #also can generate data for logistic FLAM model data2 <- sim.data(n = 50, scenario = 1, zerof = 10, family = "binomial") #fit the FLAM model with cross-validation using logistic loss #note: use larger 'n.fold' (e.g., 10) in practice flamCV.logistic.out <- flamCV(x = data2$x, y = data2$y, family = "binomial", n.fold = 2) ## End(Not run) #'flamCV' returns an object of the class 'flamCV' that includes an object #of class 'flam' (flam.out); see ?'flam-package' for an example using S3 #methods for the classes of 'flam' and 'flamCV'