enpy {pense} | R Documentation |
Computes the PY initial estimates for the EN S-estimator with different strategies for computing the principal sensitivity components.
enpy(x, y, alpha, lambda, delta, cc, options = initest_options(), en_options = en_options_aug_lars())
x |
data matrix with predictors. |
y |
response vector. |
alpha, lambda |
EN penalty parameters (NOT adjusted for the number of
observations in |
delta |
desired breakdown point of the resulting estimator. |
cc |
tuning constant for the S-estimator. Default is to chosen based
on the breakdown point |
options |
additional options for the initial estimator. See
|
en_options |
additional options for the EN algorithm. See
|
Two different methods to calculate the sensitivity components are implemented:
"rr"
Approximate the PSCs by using the residuals from the
elastic net fit and the hat matrix from the ridge regression.
This method only works if alpha
< 1 or
ncol(x)
< nrow(x)
.
"exact"
Calculate the PSCs from the difference between the residuals and leave-one-out residuals from elastic net.
coeff |
A numeric matrix with one initial coefficient per column |
objF |
A vector of values of the objective function for the respective coefficient |
Pena, D., and Yohai, V.J. (1999). A Fast Procedure for Outlier Diagnostics in Large Regression Problems. Journal of the American Statistical Association, 94(446), 434-445. http://doi.org/10.2307/2670164
set.seed(12345) n <- 50 p <- 15 beta <- c(2:5, numeric(p - 4)) x <- 1 + matrix(rnorm(n * p), ncol = p) y <- x %*% beta + rnorm(n) # add outliers to y y[1:5] <- rnorm(5, -100, 0.5) x_test <- matrix(rnorm(10 * n * p), ncol = p) y_test <- drop(x_test %*% beta + rnorm(n)) # Compute the PY using the "exact" solutions. Here, the LOO residuals # are computed using the EN estimator directly. For this method, # the DAL algorithm is usually faster since it naturally supports # "warm" starts from a nearby solution vector. init_exact <- enpy( x, y, alpha = 0.8, lambda = 0.1, delta = 0.25, options = initest_options( psc_method = "exact" ), en_options = en_options_dal() ) # Compute the PY using the approximate ("rr") solutions. Here, the LOO residuals # are approximated by ridge regression, where all the LOO residuals can be # computed at once using linear algebra. init_approx <- enpy( x, y, alpha = 0.8, lambda = 0.1, delta = 0.25, options = initest_options( psc_method = "rr" ), en_options = en_options_aug_lars() ) # How do they objective functions of the initial estimates compare? min(init_exact$objF) min(init_approx$objF) # The "exact" method finds slightly estimates with slightly lower objective. # How do the initial estimates perform on an external test set? rmspe_exact <- colMeans((y_test - cbind(1, x_test) %*% init_exact$coeff)^2) rmspe_approx <- colMeans((y_test - cbind(1, x_test) %*% init_approx$coeff)^2) min(rmspe_exact) min(rmspe_approx) # The "exact" method also has better predictive power