pliable {pliable} | R Documentation |
Fit a pliable lasso model over a path of regularization values
pliable(x, z, y, lambda = NULL, nlambda = 50, alpha = 0.5, family = c("gaussian", "binomial"), lambda.min.ratio = NULL, w = NULL, thr = 1e-05, tt = NULL, penalty.factor = rep(1, ncol(x)), maxit = 10000, mxthit = 100, mxkbt = 100, mxth = 1000, kpmax = 1e+05, kthmax = 1e+05, verbose = FALSE, maxinter = 500, zlinear = TRUE, screen = NULL)
x |
n by p matrix of predictors |
z |
n by nz matrix of modifying variables. The elements of z may represent quantitative or categorical variables, or a mixture of the two. Categorical varables should be coded by 0-1 dummy variables: for a k-level variable, one can use either k or k-1 dummy variables. |
y |
n-vector of responses. The x and z variables are centered in the function. We recommmend that x and z also be standardized before the call via x<-scale(x,T,T)/sqrt(nrow(x)-1); z<-scale(z,T,T)/sqrt(nrow(z)-1) |
lambda |
decreasing sequence of lam values desired. Default NULL. If NULL, computed in function |
nlambda |
number of lambda values desired (default 50). |
alpha |
mixing parameter- default 0.5 |
family |
response type- either "gaussian", "binomial". In the binomial case, y should be 0s and 1s. family "cox" (Prop Hazards model for right-censored data) will be implemented if there is popular demand |
lambda.min.ratio |
the smallest value for lambda , as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). Default is 0.001 if n>p, and 0.01 if n< p. Along with "screen" (see below), this parameter is a main parameter controlling the runtime of the algorithm, With a smaller lambda.min.ratio, more interactions will typically be entered. If the algorithm stalls or stops near the end of the path, trying increasing lambda.min.ratio On the other hand, if a more complex fit is desired, or the cross-validation curve from cv.pliable is still going down at the end of the path, consider decreasing lambda.min.ratio |
w |
n-vector of observation wts (Default: all ones). Not currently used with Cox family. |
thr |
convergence threshold for estimation of beta and joint estimation of (beta,theta). Default 1e-5 |
tt |
initial factor for Generalized grad algorithm for joint estimation of main effects and interaction parameters- default 0.1/mean(x^2) |
penalty.factor |
vector of penalty factors, one for each X predictors. Default: a vector of ones |
maxit |
maximum number of iterations in loop for one lambda. Default 10000 |
mxthit |
maximum number of theta solution iterations. Default 1000 |
mxkbt |
maximum number of backtracking iterations. Default 100 |
mxth |
maximum internal theta storage. Default 1000 |
kpmax |
maximum dimension of main effect storage. Default 100000 |
kthmax |
maximum dimension of interaction storage. Default 100000 |
verbose |
Should information be printed along the way? Default FALSE |
maxinter |
Maximum number of interactions allowed in the model. Default 500 |
zlinear |
Should an (unregularized) linear term for z be included in the model? Default TRUE. |
screen |
Screening quantile for features: for example a value screen=0.7 means that we keep only those features with univariate t-stats above the 0.7 quantile of all features. Default is NULL, meaning that all features are kept |
This function fits a pliable lasso model over a sequence of lambda (regularization) values. The inputs are a feature matrix x, response vector y and a matrix of modifying variables z. The pliable lasso is a generalization of the lasso in which the weights multiplying the features x are allowed to vary as a function of a set of modifying variables z. The model has the form:
yhat=beta0+ z%*% betaz+ rowSums( x[,j]*beta[j] + x[,j]*z%*% theta[j,])
The parameters are beta0 (scalar), betaz (nz-vector), beta (p-vector) and theta (p by nz). The (convex) optimization problem is
minimize_(beta0,betaz,beta,theta) (1/(2n))* sum_i ((y_i-yhat_i)^2) +(1-alpha)*lambda* sum_j (||(beta_j,theta_j)||_2 +||theta_j)|| +alpha*lambda* sum_j||theta_j||_1
A blockwise coordinate descent procedure is employed for the optimization.
A trained pliable object with the following components:
param: values of lambda used
beta: matrix of estimated beta coefficients ncol(x) by length(lambda).
theta: array of estimated theta coefficients ncol(x) by ncol(z) by length(lambda).
betaz: estimated (main effect) coefficients for z ncol(z) by length(lambda).
xbar: rowMeans of x
zbar: rowMeans of z
w: observation weights used
args: list of arguments in the original call
nulldev: null deviance for model
dev.ratio: deviance/null.dev for each model in path
nbeta: number of nonzero betas for each model in path
nbeta.with.int: number of betas with some nonzero theta values, for each model in path
ntheta: number of nonzero theta for each model in path
df: rough degrees of freedom for each model in path (=nbeta)
istor: for internal use
fstor: for internal use
thstor: for internal use
jerr: for internal use
call: the call made to pliable
cv.pliable, predict.pliable, plot.pliable, plot.cv.pliable
# Train a pliable lasso model- Gaussian case # Generate some data set.seed(9334) n = 20; p = 3 ;nz=3 x = matrix(rnorm(n*p), n, p) mx=colMeans(x) sx=sqrt(apply(x,2,var)) x=scale(x,mx,sx) z =matrix(rnorm(n*nz),n,nz) mz=colMeans(z) sz=sqrt(apply(z,2,var)) z=scale(z,mz,sz) y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n) # Fit model fit = pliable(x,z,y) fit #examine results plot(fit) #plot path #Estimate main tuning parameter lambda by cross-validation #NOT RUN # cvfit=cv.pliable(fit,x,z,y,nfolds=5) # returns lambda.min and lambda.1se, # the minimizing and 1se rules # plot(cvfit) # Predict using the fitted model #ntest=500 #xtest = matrix(rnorm(ntest*p),ntest,p) #xtest=scale(xtest,mx,sx) #ztest =matrix(rnorm(ntest*nz),ntest,nz) #ztest=scale(ztest,mz,sz) #ytest = 4*xtest[,1] +5*xtest[,1]*ztest[,3]+ 3*rnorm(ntest) #pred= predict(fit,xtest,ztest,lambda=cvfit$lambda.min) #plot(ytest,pred) #Binary outcome example n = 20 ; p = 3 ;nz=3 x = matrix(rnorm(n*p), n, p) mx=colMeans(x) sx=sqrt(apply(x,2,var)) x=scale(x,mx,sx) z =matrix(rnorm(n*nz),n,nz) mz=colMeans(z) sz=sqrt(apply(z,2,var)) z=scale(z,mz,sz) y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n) y=1*(y>0) fit = pliable(x,z,y,family="binomial") # Example where z is not observed in the test set, but predicted # from a supervised learning algorithm n = 20; p = 3 ;nz=3 x = matrix(rnorm(n*p), n, p) mx=colMeans(x) sx=sqrt(apply(x,2,var)) x=scale(x,mx,sx) z =matrix(rnorm(n*nz),n,nz) mz=colMeans(z) sz=sqrt(apply(z,2,var)) z=scale(z,mz,sz) y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n) fit = pliable(x,z,y) # predict z from x; here we use glmnet, but any other supervised method can be used zfit=cv.glmnet(x,z,family="mgaussian") # Predict using the fitted model # ntest=100 #xtest =matrix(rnorm(ntest*nz),ntest,p) #xtest=scale(xtest,mx,sx) #ztest =predict(zfit,xtest,s=cvfit$lambda.min)[,,1] #ytest = 4*xtest[,1] +5*xtest[,1]*ztest[,3]+ 3*rnorm(ntest) #pred= predict(fit,xtest,ztest) #plot(ytest,pred[,27]) # Just used 27th path member, for illustration; see cv.pliable for # details of how to cross-validate in this setting