MaximallySelectedStatisticsTests {coin} | R Documentation |
Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.
## S3 method for class 'formula' maxstat_test(formula, data, subset = NULL, weights = NULL, ...) ## S3 method for class 'table' maxstat_test(object, ...) ## S3 method for class 'IndependenceProblem' maxstat_test(object, teststat = c("maximum", "quadratic"), distribution = c("asymptotic", "approximate", "none"), minprob = 0.1, maxprob = 1 - minprob, ...)
formula |
a formula of the form |
data |
an optional data frame containing the variables in the model formula. |
subset |
an optional vector specifying a subset of observations to be used. Defaults
to |
weights |
an optional formula of the form |
object |
an object inheriting from classes |
teststat |
a character, the type of test statistic to be applied: either a maximum
statistic ( |
distribution |
a character, the conditional null distribution of the test statistic can be
approximated by its asymptotic distribution ( |
minprob |
a numeric, a fraction between 0 and 0.5 specifying that cutpoints only
greater than the |
maxprob |
a numeric, a fraction between 0.5 and 1 specifying that cutpoints only
smaller than the |
... |
further arguments to be passed to |
maxstat_test
provides generalized maximally selected statistics. The
family of maximally selected statistics encompasses a large collection of
procedures used for the estimation of simple cutpoint models including, but
not limited to, maximally selected chi^2 statistics, maximally
selected Cochran-Armitage statistics, maximally selected rank statistics and
maximally selected statistics for multiple covariates. A general description
of these methods is given by Hothorn and Zeileis (2008).
The null hypothesis of independence, or conditional independence given
block
, between y1
, ..., yq
and x1
, ...,
xp
is tested against cutpoint alternatives. All possible partitions
into two groups are evaluated for each unordered covariate x1
, ...,
xp
, whereas only order-preserving binary partitions are evaluated for
ordered or numeric covariates. The cutpoint is then a set of levels defining
one of the two groups.
If both response and covariate is univariable, say y1
and x1
,
this procedure is known as maximally selected chi^2 statistics
(Miller and Siegmund, 1982) when y1
is a binary factor and x1
is
a numeric variable, and as maximally selected rank statistics when y1
is a rank transformed numeric variable and x1
is a numeric variable
(Lausen and Schumacher, 1992). Lausen et al. (2004) introduced
maximally selected statistics for a univariable numeric response and multiple
numeric covariates x1
, ..., xp
.
If, say, y1
and/or x1
are ordered factors, the default scores,
1:nlevels(y1)
and 1:nlevels(x1)
respectively, can be altered
using the scores
argument (see independence_test
); this
argument can also be used to coerce nominal factors to class "ordered"
.
If both, say, y1
and x1
are ordered factors, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the alternative
argument. The particular
extension to the case of a univariable binary factor response and a
univariable ordered covariate was given by Betensky and Rabinowitz (1999) and
is known as maximally selected Cochran-Armitage statistics.
The conditional null distribution of the test statistic is used to obtain
p-values and an asymptotic approximation of the exact distribution is
used by default (distribution = "asymptotic"
). Alternatively, the
distribution can be approximated via Monte Carlo resampling by setting
distribution
to "approximate"
. See asymptotic
and
approximate
for details.
An object inheriting from class "IndependenceTest"
.
Starting with coin version 1.1-0, maximum statistics and quadratic forms
can no longer be specified using teststat = "maxtype"
and
teststat = "quadtype"
respectively (as was used in versions prior to
0.4-5).
Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected chi^2 statistics for k x 2 tables. Biometrics 55(1), 317–320.
Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis 43(2), 121–137.
Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics 64(4), 1263–1269.
Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Optimally selected prognostic factors. Biometrical Journal 46(3), 364–374.
Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics. Biometrics 48(1), 73–85.
Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38(4), 1011–1016.
Müller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research 123(3), 219–228.
## Tree pipit data (Mueller and Hothorn, 2004) ## Asymptotic maximally selected statistics maxstat_test(counts ~ coverstorey, data = treepipit) ## Asymptotic maximally selected statistics ## Note: all covariates simultaneously mt <- maxstat_test(counts ~ ., data = treepipit) mt@estimates$estimate ## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2) ## Asymptotic maximally selected statistics maxstat_test(Surv(time, event) ~ EF, data = hohnloser, ytrafo = function(data) trafo(data, surv_trafo = function(y) logrank_trafo(y, ties.method = "Hothorn-Lausen"))) ## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3) ## Asymptotic maximally selected statistics data("sphase", package = "TH.data") maxstat_test(Surv(RFS, event) ~ SPF, data = sphase, ytrafo = function(data) trafo(data, surv_trafo = function(y) logrank_trafo(y, ties.method = "Hothorn-Lausen"))) ## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8) ## Asymptotic maximally selected statistics maxstat_test(jobsatisfaction) ## Asymptotic maximally selected statistics ## Note: 'Job.Satisfaction' and 'Income' as ordinal maxstat_test(jobsatisfaction, scores = list("Job.Satisfaction" = 1:4, "Income" = 1:4))