EBglmnet {EBglmnet} | R Documentation |
EBglmnet is the main function to fit a generalized linear model via the empirical Bayesian methods with lasso and elastic net hierarchical priors.
It features with p>>n
capability, produces a sparse outcome for the
regression coefficients, and performs significance test for nonzero effects
in both linear and logistic regression models.
EBglmnet(x, y, family=c("gaussian","binomial"),prior= c("lassoNEG","lasso","elastic net"), hyperparameters,Epis = FALSE,group = FALSE, verbose = 0)
x |
input matrix of dimension |
y |
response variable. Continuous for |
family |
model type taking values of "gaussian" (default) or "binomial". |
prior |
prior distribution to be used. It takes values of "lassoNEG"(default), "lasso", and "elastic net". All priors will produce a sparse outcome of the regression coefficients; see Details for choosing priors. |
hyperparameters |
the optimal hyperparameters in the prior distribution. Similar as λ in lasso
method, the hyperparameters control the number of nonzero elements in the regression coefficients. Hyperparameters
are most oftenly determined by CV. See |
Epis |
Boolean parameter for including two-way interactions. By default, |
group |
Boolean parameter for |
verbose |
parameter that controls the level of message output from EBglment. It takes values from 0 to 5; larger verbose displays more messages. small values are recommended to avoid excessive outputs. Default value for |
EBglmnet implements three set of hierarchical prior distributions for the regression parameters β:
lasso prior:
β_j \sim N(0,σ_j^2),
σ_j^2 \sim exp(λ), j = 1, …, p.
lasso-NEG prior:
β_j \sim N(0,σ_j^2),
σ_j^2 \sim exp(λ),
λ \sim gamma(a,b), j = 1, …, p.
elastic net prior:
β_j \sim N[0,(λ_1 + \tilde{σ_j}^{-2})^{-2}],
\tilde{σ_j}^{2} \sim generalized-gamma(λ_1, λ_2), j = 1, …,p.
The prior distributions are peak zero and flat tail probability distributions that assign a high prior
probability mass to zero and still allow heavy probability on the two tails, which reflect the prior
belief that a sparse solution exists: most of the variables will have no effects on the response variable,
and only some of the variables will have non-zero effects in contributing the outcome in y
.
The three priors all contains hyperparameters that control how heavy the tail probability is,
and different values of them will yield different number of non-zero effects retained in the model.
Appropriate selection of their values is required for obtaining optimal results,
and CV is the most oftenly used method. See cv.EBglmnet
for details for determining the
optimal hyperparameters in each priors under different GLM families.
lassoNEG prior
"lassoNEG"
prior has two hyperparameters (a,b), with a ≥ -1 and b>0
. Although
a
is allowed to be greater than -1.5, it is not encouraged to choose values in (-1.5, -1) unless the signal-to-noise
ratio in the explanatory variables are very small.
lasso prior
"lasso"
prior has one hyperparameter λ, with λ ≥ 0. λ is similar as
the shrinkage parameter in lasso
except that even for p>>n
, λ is allowed to be zero, and EBlasso
can still provide a sparse solution thanks to the implicit constraint that σ^2 ≥ 0.
elastic net prior
Similar as the elastic net in package glmnet, EBglmnet transforms the two hyperparameters λ_1
and λ_2 in the "elastic net"
prior in terms of other two parameters α (0≤ α ≤ 1)
and λ (λ >0). Therefore, users are asked to specify hyperparameters=c
(α, λ).
fit |
the model fit using the hyperparameters provided. EBglmnet selects the variables having nonzero regression
coefficients and estimates their posterior distributions. With the posterior mean and variance, a |
WaldScore |
the Wald Score for the posterior distribution. It is computed as β^TΣ^{-1}β. See (Huang A, 2014b) for using Wald Score to identify significant effect set. |
Intercept |
the intercept in the linear regression model. This parameter is not shrunk. |
residual variance |
the residual variance if the Gaussian family is assumed in the GLM |
logLikelihood |
the log Likelihood if the Binomial family is assumed in the GLM |
hyperparameters |
the hyperparameter used to fit the model |
family |
the GLM family specified in this function call |
prior |
the prior used in this function call |
call |
the call that produced this object |
nobs |
number of observations |
Anhui Huang and Dianting Liu
Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
Cai, X., Huang, A., and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12, 211.
Huang A, Xu S, Cai X. (2013). Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 14(1):5.
Huang, A., Xu, S., and Cai, X. (2014a). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 10.1038/hdy.2014.79
rm(list = ls()) library(EBglmnet) #Use R built-in data set state.x77 y= state.x77[,"Life Exp"] xNames = c("Population","Income","Illiteracy", "Murder","HS Grad","Frost","Area") x = state.x77[,xNames] # #Gaussian Model #lassoNEG prior as default out = EBglmnet(x,y,hyperparameters=c(0.5,0.5)) out$fit #lasso prior out = EBglmnet(x,y,prior= "lasso",hyperparameters=0.5) out$fit #elastic net prior out = EBglmnet(x,y,prior= "elastic net",hyperparameters=c(0.5,0.5)) out$fit #residual variance out$res #intercept out$Intercept # #Binomial Model #create a binary response variable yy = y>mean(y); out = EBglmnet(x,yy,family="binomial",hyperparameters=c(0.5,0.5)) out$fit #with epistatic effects out = EBglmnet(x,yy,family="binomial",hyperparameters=c(0.5,0.5),Epis =TRUE) out$fit