dts_stat {twosamples} | R Documentation |
Two Sample Test
dts_stat(a, b, power = 1) dts_test(a, b, nboots = 2000, p = default.p) two_sample(a, b, nboots = 2000, p = default.p)
a |
a vector of numbers |
b |
a vector of numbers |
power |
also the power to raise the test stat to |
nboots |
Number of bootstrap iterations |
p |
power to raise test stat to |
The DTS test compares two ECDFs by looking at the reweighted Wasserstein distance between the two. If the wass_test extends cvm_test to interval data, then DTS extends ad_test to interval data. Formally – if E is the ECDF of sample 1, F is the ECDF of sample 2, and G is the ECDF of the combined sample, then DTS = Integral |E(x)-F(x)|/(G(x)(1-G(x))) across all x. The test p-value is calculated by randomly resampling two samples of the same size using the combined sample. Intuitively the DTS test improves on AD by allowing more extreme observations to carry more weight. At a higher level – CVM/AD/KS/etc only require ordinal data. DTS gains its power because it takes advantages of the properties of interval data – i.e. the distances have some meaning. This is the same argument as Wasserstein vs AD/CVM/KS. However, DTS, like Anderson-Darling (AD) also downweights noisier observations relative to WASS, thus (hopefully) giving it extra power.
Output is a length 2 Vector with test stat and p-value in that order. That vector has 3 attributes – the sample sizes of each sample, and the number of bootstraps performed for the pvalue.
dts_stat
: Test statistic based on a weighted area between ECDFs
dts_test
: Permutation based two sample test
two_sample
: Recommended two-sample test
vec1 = rnorm(20) vec2 = rnorm(20,4) dts_stat(vec1,vec2) dts_test(vec1,vec2) two_sample(vec1,vec2)