coxphMIC {coxphMIC} | R Documentation |
Sparse Estimation for a Cox PH model via Approximated Information Criterion
coxphMIC(formula = Surv(time, status) ~ ., data, method.beta0 = "MPLE", beta0 = NULL, theta0 = 1, method = "BIC", lambda0 = 2, a0 = NULL, scale.x = TRUE, maxit.global = 300, maxit.local = 100, rounding.digits = 4, zero = sqrt(.Machine$double.eps), compute.se.gamma = TRUE, compute.se.beta = TRUE, CI.gamma = TRUE, conf.level = 0.95, details = FALSE)
formula |
A formula object, with the response on the left of a |
data |
A data.frame in which to interpret the variables named in the |
method.beta0 |
A method to supply the starting point for beta with choices: |
beta0 |
User-supplied beta0 value, the starting point for optimization. |
theta0 |
Specified the penalty parameter for the ridge estimator when |
method |
Specifies the model selection criterion used. If |
lambda0 |
User-supplied penalty parameter for model complexity. If |
a0 |
The scale (or sharpness) parameter used in the hyperbolic tangent penalty. By default, |
scale.x |
Logical value: should the predictors X be normalized? Default to |
maxit.global |
Maximum number of iterations allowed for the global optimization algorithm – |
maxit.local |
Maximum number of iterations allowed for the local optimizaiton algorithm – |
rounding.digits |
Number of digits after the decimal point for rounding-up estiamtes. Default value is 4. |
zero |
Tolerance level for convergence. Default is |
compute.se.gamma |
Logical value indicating whether to compute the standard errors for gamma in the
reparameterization. Default is |
compute.se.beta |
Logical value indicating whether to compute the standard errors for nonzero beta estimates.
Default is |
CI.gamma |
Logical indicator of whether the confidence inverval for |
conf.level |
Specifies the confidence level for |
details |
Logical value: if |
The main idea of MIC involves approximation of the l0 norm with a continuous or smooth unit dent function. This method bridges the best subset selection and regularization by borrowing strength from both. It mimics the best subset selection using a penalized likelihood approach yet with no need of a tuning parameter.
The problem is further reformulated with a reparameterization step by relating beta
to gamma
. There are two benefits of doing so: first, it reduces the optimization to
one unconstrained nonconvex yet smooth programming problem, which can be solved efficiently
as in computing the maximum partial likelihood estimator (MPLE); furthermore, the
reparameterization tactic yields an additional advantage in terms of circumventing post-selection inference.
Significance testing on beta
can be done through gamma
.
The solve the smooth yet nonconvex optimization, a simulated annealing (method="SANN"
option
in optim
) global optimization algorithm is first applied. The resultant estimator is then used
as the starting point for another local optimization algorithm. The quasi-Newton BFGS method (method="BFGS"
in optim
) is used.
In its current version, some appropriate data preparation might be needed. For example, nomincal variables (especially character-valued ones) needed to be coded with dummy variables; missing values would cause errors too and hence need prehanlding too.
A list containing the following component is returned.
Results from the preliminary run of a global optimization procedure (SANN
as default).
Results from the second run of a local optimization procedure (BFGS
as default).
Value of the minimized objective function.
Estimated gamma;
Estimated beta;
The estimated variance-covariance matrix for the gamma estimate;
Standard errors for the gamma estimate;
Standard errors for the beta estimate (post-selection);
The BIC value for the selected model;
A summary table of the fitting results.
the matched call.
Abdolyousefi, R. N. and Su, X. (2016). coxphMIC: An R package for sparse estimation of Cox PH Models via approximated information criterion. Tentatively accepted, The R Journal.
Su, X. (2015). Variable selection via subtle uprooting. Journal of Computational and Graphical Statistics, 24(4): 1092–1113. URL http://www.tandfonline.com/doi/pdf/10.1080/10618600.2014.955176
Su, X., Wijayasinghe, C. S., Fan, J., and Zhang, Y. (2015). Sparse estimation of Cox proportional hazards models via approximated information criteria. Biometrics, 72(3): 751–759. URL http://onlinelibrary.wiley.com/doi/10.1111/biom.12484/epdf
# PREPARE THE PBC DATA library(survival); data(pbc); dat <- pbc; dim(dat); dat$status <- ifelse(pbc$status==2, 1, 0) # HANDLE CATEGORICAL VARIABLES dat$sex <- ifelse(pbc$sex=="f", 1, 0) # LISTWISE DELETION USED TO HANDLE MISSING VALUES dat <- stats::na.omit(dat); dim(dat); utils::head(dat) fit.mic <- coxphMIC(formula=Surv(time, status)~.-id, data=dat, method="BIC", scale.x=TRUE) names(fit.mic) print(fit.mic) plot(fit.mic)