pensem {pense} | R Documentation |
Compute the PENSEM estimate, an efficient and robust elastic net estimator for linear regression.
pensem(x, ...) ## Default S3 method: pensem(x, y, alpha = 0.5, nlambda = 50, lambda, lambda_s, lambda_min_ratio, standardize = TRUE, initial = c("warm", "cold"), warm_reset = 10, cv_k = 5, cv_objective, ncores = getOption("mc.cores", 1L), cl = NULL, s_options = pense_options(), mm_options = mstep_options(), init_options = initest_options(), en_options = en_options_aug_lars(), ...) ## S3 method for class 'pense' pensem(x, alpha, scale, nlambda = 50, lambda, lambda_min_ratio, standardize, cv_k = 5, cv_objective, ncores = getOption("mc.cores", 1L), cl = NULL, mm_options = mstep_options(), en_options, x_train, y_train, ...)
x |
either a numeric data matrix or a fitted PENSE estimate obtained
from |
... |
currently ignored. |
y |
numeric response vector. |
alpha |
elastic net mixing parameter with 0 ≤ α ≤ 1.
|
nlambda |
if |
lambda |
a single value or a grid of values for the regularization
parameter of the M-step.
Assumed to be on the same scale as the data.
If missing, a grid of lambda
values is automatically generated (see parameter |
lambda_s |
regularization parameter for the S-estimator.
If missing, a grid of lambda values is chosen automatically.
If |
lambda_min_ratio |
If the grid should be chosen automatically, the ratio of the smallest lambda to the (computed) largest lambda. |
standardize |
should the data be standardized robustly? Estimates
are returned on the original scale. Defaults to |
initial |
how to initialize the estimator at a new lambda in the grid.
The default, |
warm_reset |
if |
cv_k |
perform k-fold CV to choose the optimal lambda for prediction. |
cv_objective |
a function (name) to compute the CV performance. By default, the robust tau-scale is used. |
ncores, cl |
use multiple cores or the supplied cluster for the
cross-validation. See |
s_options |
additional options for the PENSE algorithm.
See |
mm_options |
additional options for the M-step. |
init_options |
additional options for computing the cold initial
estimates.
Ignored if |
en_options |
additional options for the EN algorithm.
See |
scale |
initial scale estimate for the M step. By default the
S-scale from the initial estimator ( |
x_train, y_train |
override arguments
provided to the original call to |
Performs an M-step using the S-estimator at the optimal penalty
parameter as returned from pense
as the initial
estimate. For "fat" datasets, the initial scale as returned by the
S-estimate is adjusted according to Maronna & Yohai (2010).
An object of class "pensem"
. All elements as an object
of class pense
as well as the following:
init_scale |
the initial scale estimate used in the M step. |
sest |
the PENSE estimate used to initialize the M step. |
bdp |
breakdown point of the MM-estimator. |
Maronna, R. and Yohai, V. (2010). Correcting MM estimates for "fat" data sets. Computational Statistics & Data Analysis, 54:31683173.
pense
to compute only the S-estimator.
## ## A very simple example on artificial data ## # Generate some dummy data set.seed(12345) n <- 30 p <- 15 x <- 1 + matrix(rnorm(n * p), ncol = p) y <- x %*% c(2:5, numeric(p - 4)) + rnorm(n) x_test <- 1 + matrix(rnorm(10 * n * p), ncol = p) y_test <- x_test %*% c(2:5, numeric(p - 4)) + rnorm(n) # Compute the MM-estimator with an EN penalty for 30 lambda values # (Note: In real applications, warm_reset should be at least 5) set.seed(1234) est_mm <- pensem( x, y, alpha = 0.7, nlambda = 20, warm_reset = 1L, cv_k = 3 ) # We can plot the CV prediction error curve plot(est_mm) # What is the RMSPE on test data (rmspe <- sqrt(mean((y_test - predict(est_mm, newdata = x_test))^2))) ## ## This is the same as computing first the S-estimator and adding the ## M-step afterwards ## set.seed(1234) est_s <- pense( x, y, alpha = 0.7, nlambda = 20, warm_reset = 1L, cv_k = 3 ) est_mm_2 <- pensem( est_s, nlambda = 20, cv_k = 3 ) ## The initial S-estimate is the same used in both `pensem` calls ## because the seed which governs the CV to select the optimal lambda was the ## same sum(abs(est_s$coefficients - est_mm$sest$coefficients)) ## Therefore, the MM-estimate at each lambda is also the same sum(abs(est_mm_2$coefficients - est_mm$coefficients))