elnet {pense} | R Documentation |
Estimate the elastic net regression coefficients.
elnet(x, y, alpha, nlambda = 100, lambda, weights, intercept = TRUE, options = en_options_aug_lars(), lambda_min_ratio, xtest, correction = TRUE)
x |
data matrix with predictors |
y |
response vector |
alpha |
controls the balance between the L1 and the L2 penalty.
|
nlambda |
size of the lambda grid if |
lambda |
a grid of decreasing lambda values. |
weights |
an optional vector of weights to be used in the fitting
process. Should be |
intercept |
should an intercept be estimated? |
options |
additional options for the EN algorithm. See
|
lambda_min_ratio |
if the lambda grid should be automatically defined,
the ratio of the smallest to the largest lambda value in the grid. The
default is |
xtest |
data matrix with predictors used for prediction. This is useful for testing the prediction performance on an independent test set. |
correction |
should the "corrected" EN estimator be returned.
If |
This solves the minimization problem
(1/2N) RSS + λ * ( (1 - α) / 2 * L2(β)^2 + α * L1(β) )
If weights are supplied, the minimization problem becomes
(1/2N) ∑(w * (y - r^2)) + λ * ( (1 - α) / 2 * L2(β)^2 + α * L1(β) )
lambda |
vector of lambda values. |
status |
integer specifying the exit status of the EN algorithm. |
message |
explanation of the exit status. |
coefficients |
matrix of regression coefficients. Each column corresponds to the estimate for the lambda value at the same index. |
residuals |
matrix of residuals. Each column corresponds to the residuals for the lambda value at the same index. |
predictions |
if |
Currently this function can compute the elastic net estimator using either
augmented LARS or the Dual Augmented Lagrangian (DAL) algorithm
(Tomioka 2011).
Augmented LARS performs LASSO via the LARS algorithm (or OLS if
alpha = 0
) on the data matrix augmented with the L2 penalty term.
The time complexity of this algorithm increases fast with an increasing
number of predictors. The algorithm currently can not leverage a previous or
an approximate solution to speed up computations. However, it is always
guaranteed to find the solution.
DAL is an iterative algorithm directly minimizing the Elastic Net objective. The algorithm can take an approximate solution to the problem to speed up convergence. In the case of very small lambda values and a bad starting point, DAL may not converge to the solution and hence give wrong results. This would be indicated in the returned status code. Time complexity of this algorithm is dominated by the number of observations.
DAL is much faster for a small number of observations (< 200) and a large number of predictors, especially if an approximate solution is available.
Tomioka, R., Suzuki, T. and Sugiyama, M. (2011). Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparse Learning. Journal of Machine Learning Research 12(May):1537-1586.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2):301-320.
elnet_cv
for automatic selection of the penalty
parameter based on the cross-validated prediction error.
# Generate some dummy data set.seed(12345) n <- 30 p <- 15 x <- 1 + matrix(rnorm(n * p), ncol = p) y <- x %*% c(2:5, numeric(p - 4)) + rnorm(n) x_test <- matrix(rnorm(10 * n * p), ncol = p) y_test <- drop(x_test %*% c(2:5, numeric(p - 4)) + rnorm(n)) # Compute the classical EN with predictions for x_test set.seed(1234) est <- elnet( x, y, alpha = 0.6, nlambda = 100, xtest = x_test ) # Plot the RMSPE computed from the given test set rmspe_test <- sqrt(colMeans((y_test - est$predictions)^2)) plot(est$lambda, rmspe_test, log = "x") ## ## For large data sets, the DAL algorithm is much faster ## set.seed(12345) n <- 100 p <- 1500 x <- 1 + matrix(rnorm(n * p), ncol = p) y <- x %*% c(2:5, numeric(p - 4)) + rnorm(n) x_test <- matrix(rnorm(10 * n * p), ncol = p) y_test <- drop(x_test %*% c(2:5, numeric(p - 4)) + rnorm(n)) # The DAL algorithm takes ~1.5 seconds to compute the solution path set.seed(1234) system.time( est_dal <- elnet( x, y, alpha = 0.6, nlambda = 100, options = en_options_dal(), xtest = x_test ) ) # In comparison, the augmented LARS algorithm can take several minutes set.seed(1234) system.time( est_auglars <- elnet( x, y, alpha = 0.6, nlambda = 100, options = en_options_aug_lars(), xtest = x_test ) )