pense {pense} | R Documentation |
Computes the highly robust Penalized Elastic Net S-estimators (PENSE) for linear regression models.
pense(x, y, alpha = 0.5, nlambda = 50, lambda, lambda_min_ratio, standardize = TRUE, initial = c("warm", "cold"), warm_reset = 10, cv_k = 5, cv_objective, ncores = getOption("mc.cores", 1L), cl = NULL, options = pense_options(), init_options = initest_options(), en_options = en_options_aug_lars())
x |
design matrix with predictors. |
y |
response vector. |
alpha |
elastic net mixing parameter with 0 ≤ α ≤ 1.
|
nlambda |
if |
lambda |
a single value or a grid of values for the regularization
parameter lambda.
Assumed to be on the same scale as the data and adjusted for
S-estimation. If missing a grid of lambda
values is automatically generated (see parameter |
lambda_min_ratio |
If the grid should be chosen automatically, the ratio of the smallest lambda to the (computed) largest lambda. |
standardize |
should the data be standardized robustly? Estimates
are returned on the original scale. Defaults to |
initial |
how to initialize the estimator at a new lambda in the grid.
The default, |
warm_reset |
if |
cv_k |
number of cross-validation segments to use to choose the optimal lambda from the grid. If only a single value of lambda is given, cross-validation can still done to estimate the prediction performance at this particular lambda. |
cv_objective |
a function (name) to compute the CV performance. By default, the robust tau-scale is used. |
ncores, cl |
the number of processor cores or an actual parallel cluster
to use to estimate the optimal value of lambda. See
|
options |
additional options for the PENSE algorithm.
See |
init_options |
additional options for computing the cold initial
estimates.
Ignored if |
en_options |
additional options for the EN algorithm.
See |
The PENSE estimate minimizes the robust M-scale of the residuals penalized
by the L1 and L2 norm of the regression coefficients (elastic net penalty).
The level of penalization is chosen to minimize the cv_k
-fold
cross-validated prediction error (using a robust measure).
An object of class "pense"
with elements
lambda |
grid of regularization parameter values for which an estimate is available. |
lambda_opt |
the optimal value of the regularization parameter according to CV. |
coefficients |
a sparse matrix of coefficients for each lambda in the grid. |
residuals |
a matrix of residuals for each lambda in the grid. |
cv_lambda_grid |
a data frame with CV prediction errors and several statistics of the solutions. |
scale |
the estimated scales each lambda in the grid. |
objective |
value of the objective function at each lambda in the grid. |
adjusted |
necessary information to compute the corrected EN estimates. |
call |
the call that produced this object. |
... |
values of the given arguments. |
By default (initial == "warm"
), the method does not compute a
full initial estimate at each
lambda value in the grid, but only at warm_reset
of the lambda
values. At the remaining lambda values, the estimate at the previous
lambda value is used to initialize the estimator (the lambda grid is
first traversed in descending and then in ascending direction). If
warm_reset
is 1, only the 0-vector is used to initialize PENSE at the
largest penalty value. No further initial estimates are computed.
If initial == "cold"
, a full initial estimate is computed at each
lambda value. This is equal to setting warm_reset
to
length(lambda)
.
To improve the S-estimate with an M-step, see pensem
.
## ## A very simple example on artificial data ## # Generate some dummy data set.seed(12345) n <- 30 p <- 15 x <- 1 + matrix(rnorm(n * p), ncol = p) y <- x %*% c(2:5, numeric(p - 4)) + rnorm(n) x_test <- matrix(rnorm(10 * n * p), ncol = p) y_test <- x_test %*% c(2:5, numeric(p - 4)) + rnorm(n) # Compute the S-estimator with an EN penalty for 30 lambda values # (Note: In real applications, warm_reset should be at least 5) set.seed(1234) est <- pense( x, y, alpha = 0.6, nlambda = 20, warm_reset = 1L, cv_k = 3 ) # We can plot the CV prediction error curve plot(est) # What is the RMSPE on test data (rmspe <- sqrt(mean((y_test - predict(est, newdata = x_test))^2))) ## ## What happens if we replace 5 observations in the dummy data ## with outliers? ## y_out <- y y_out[1:3] <- rnorm(3, -500) # Compute the S-estimator again # (Note: In real applications, warm_reset should be at least 5) set.seed(12345) est_out <- pense( x, y_out, alpha = 0.6, nlambda = 20, warm_reset = 1L, cv_k = 3 ) # How does the RMSPE compare? rmspe_out <- sqrt(mean((y_test - predict(est_out, newdata = x_test))^2)) c(rmspe = rmspe, rmspe_out = rmspe_out)