vgam {VGAM} | R Documentation |
Fit a vector generalized additive model (VGAM). This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.
vgam(formula, family, data = list(), weights = NULL, subset = NULL, na.action = na.fail, etastart = NULL, mustart = NULL, coefstart = NULL, control = vgam.control(...), offset = NULL, method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, contrasts = NULL, constraints = NULL, extra = list(), form2 = NULL, qr.arg = FALSE, smart = TRUE, ...)
formula |
a symbolic description of the model to be fit.
The RHS of the formula is applied to each linear/additive predictor,
and usually includes at least one |
family |
Same as for |
data |
an optional data frame containing the variables in the model.
By default the variables are taken from
|
weights, subset, na.action |
Same as for |
etastart, mustart, coefstart |
Same as for |
control |
a list of parameters for controlling the fitting process.
See |
method |
the method to be used in fitting the model.
The default (and presently only) method |
constraints, model, offset |
Same as for |
x.arg, y.arg |
logical values indicating whether the model matrix and response
vector/matrix used in the fitting process should be assigned in the
|
contrasts, extra, form2, qr.arg, smart |
Same as for |
... |
further arguments passed into |
A vector generalized additive model (VGAM) is loosely defined as a statistical model that is a function of M additive predictors. The central formula is given by
eta_j = sum_{k=1}^p f_{(j)k}(x_k)
where x_k is the kth explanatory variable
(almost always x_1=1 for the intercept term),
and
f_{(j)k} are smooth functions of x_k that are estimated
by smoothers. The first term in the summation is just the intercept.
Currently only one type of smoother is
implemented and this is called a vector (cubic smoothing spline)
smoother.
Here, j=1,…,M where M is finite.
If all the functions are constrained to be linear then
the resulting model is a vector generalized linear model
(VGLM). VGLMs are best fitted with vglm
.
Vector (cubic smoothing spline) smoothers are represented
by s()
(see s
). Local
regression via lo()
is not supported. The
results of vgam
will differ from the gam()
(in the gam) because vgam()
uses a different
knot selection algorithm. In general, fewer knots are
chosen because the computation becomes expensive when
the number of additive predictors M is large.
The underlying algorithm of VGAMs is iteratively
reweighted least squares (IRLS) and modified vector backfitting
using vector splines. B-splines are used as the basis functions
for the vector (smoothing) splines.
vgam.fit()
is the function that actually does the
work. The smoothing code is based on F. O'Sullivan's
BART code.
A closely related methodology based on VGAMs called
constrained additive ordination (CAO) first forms
a linear combination of the explanatory variables (called
latent variables) and then fits a GAM to these.
This is implemented in the function cao
for a very limited choice of family functions.
An object of class "vgam"
(see vgam-class
for further information).
Currently vgam
can only handle constraint matrices cmat
,
say, such that crossprod(cmat)
is diagonal.
This is a bug that I will try to fix up soon;
see is.buggy
.
See warnings in vglm.control
.
This function can fit a wide variety of statistical models. Some of
these are harder to fit than others because of inherent numerical
difficulties associated with some of them. Successful model fitting
benefits from cumulative experience. Varying the values of arguments
in the VGAM family function itself is a good first step if
difficulties arise, especially if initial values can be inputted.
A second, more general step, is to vary the values of arguments in
vgam.control
.
A third step is to make use of arguments such as etastart
,
coefstart
and mustart
.
Some VGAM family functions end in "ff"
to avoid interference with other functions, e.g.,
binomialff
, poissonff
,
gaussianff
, gammaff
. This is
because VGAM family functions are incompatible with
glm
(and also gam
in the gam library and gam
in the mgcv library).
The smart prediction (smartpred
) library
is packed with the VGAM library.
The theory behind the scaling parameter is currently being made more rigorous, but it it should give the same value as the scale parameter for GLMs.
Thomas W. Yee
Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.
Yee, T. W. (2008)
The VGAM
Package.
R News, 8, 28–39.
is.buggy
,
vgam.control
,
vgam-class
,
vglmff-class
,
plotvgam
,
vglm
,
s
,
vsmooth.spline
,
cao
.
# Nonparametric proportional odds model pneumo <- transform(pneumo, let = log(exposure.time)) vgam(cbind(normal, mild, severe) ~ s(let), cumulative(parallel = TRUE), data = pneumo, trace = TRUE) # Nonparametric logistic regression fit <- vgam(agaaus ~ s(altitude, df = 2), binomialff, data = hunua) ## Not run: plot(fit, se = TRUE) pfit <- predict(fit, type = "terms", raw = TRUE, se = TRUE) names(pfit) head(pfit$fitted) head(pfit$se.fit) pfit$df pfit$sigma # Fit two species simultaneously fit2 <- vgam(cbind(agaaus, kniexc) ~ s(altitude, df = c(2, 3)), binomialff(multiple.responses = TRUE), data = hunua) coef(fit2, matrix = TRUE) # Not really interpretable ## Not run: plot(fit2, se = TRUE, overlay = TRUE, lcol = 3:4, scol = 3:4) ooo <- with(hunua, order(altitude)) with(hunua, matplot(altitude[ooo], fitted(fit2)[ooo,], ylim = c(0, 0.8), xlab = "Altitude (m)", ylab = "Probability of presence", las = 1, main = "Two plant species' response curves", type = "l", lwd = 2)) with(hunua, rug(altitude)) ## End(Not run) # The subset argument does not work here. Use subset() instead. set.seed(1) zdata <- data.frame(x2 = runif(nn <- 100)) zdata <- transform(zdata, y = rbinom(nn, 1, 0.5)) zdata <- transform(zdata, subS = runif(nn) < 0.7) sub.zdata <- subset(zdata, subS) # Use this instead if (FALSE) fit4a <- vgam(cbind(y, y) ~ s(x2, df = 2), binomialff(multiple.responses = TRUE), data = zdata, subset = subS) # This fails!!! fit4b <- vgam(cbind(y, y) ~ s(x2, df = 2), binomialff(multiple.responses = TRUE), data = sub.zdata) # This succeeds!!!