drmdel {drmdel} | R Documentation |
Fit a semiparametric density ratio model (DRM) to m+1 (m>=1) samples using maximum dual empirical likelihood method.
Denote the population cumulative distribution functions of the m+1 samples as F_k(x)'s, k = 0, 1, ..., m. We pick F_0(x) as a baseline distribution. The DRM assumes that the ratio of the density of each non-baseline distribution to the density of the baseline distribution satisfies
dF_k(x)/dF_0(x) = exp(alpha + beta^T*q(x)), k = 1, ..., m
where q(x) is a pre-specified d-dimensional basis function of data, and alpha, beta are model parameters. No parametric form for baseline distribution F_0 is assumed.
drmdel(x, n_samples, basis_func, g_null=NULL, g_null_jac=NULL, par_dim_null=NULL, ...)
x |
a vector formed by concatenating multiple samples, x_0, x_1, ..., x_m, in the order of baseline sample (x_0), non-baseline sample 1 (x_1), ..., non-baseline sample m (x_m). |
n_samples |
a vector of length m+1 specifying the sizes of the multiple samples, in the order of 0, 1, ..., m. |
basis_func |
basis function q(x) of the DRM; must either be an integer between 1 and 11 or a function of the data, x. The integers represents built-in basis-functions: 1: q(x) = x. 2: q(x) = log(|x|). 3: q(x) = sqrt(|x|). 4: q(x) = x^2. 5: q(x) = (x, x^2); Normal model. 6: q(x) = (x, log(|x|)); Gamma model. 7: q(x) = (log(|x|), sqrt(|x|), x). 8: q(x) = (log(|x|), sqrt(|x|), x^2). 9: q(x) = (log(|x|), x, x^2). 10: q(x) = (sqrt(|x|), x, x^2). 11: q(x) = (log(|x|), sqrt(|x|), x, x^2). If the basis function one wants to use is in the above list, one should use the built-in function to maximize the speed of model fitting. |
g_null |
the function specifying the null hypothesis about DRM parameter beta if there is one; The default is NULL. |
g_null_jac |
a funciton specifying the Jacobian matrix of g_null, which must return a matrix of dimension m*d by dim(par_null), if available. The default is NULL. |
par_dim_null |
dimension of the parameter vector in null hypothesis if there is one. The default is NULL. If one is carrying out a hypothesis testing problem with a fully specified null hypothesis, one should specify g_null_jac=NULL and par_dim_null=0. |
... |
further arguments to be passed to the R
function |
drm_info |
a list of basic information about the DRM: m: number of samples - 1. d: dimension of the basis function. n_smaples: the input vector of length m+1 specifying the size of each sample. n_total: total sample size. basis_func: the input basis function of the DRM. rho: sample proportion: n_samples/n_total. |
mele |
maximum empirical likelihood estimator (MELE) of the model parameters. The output is a vector organized in the following form: (alpha[1], beta[1,1], beta[1,2], ..., beta[1,d], alpha[2], beta[2,1], beta[2,2], ..., beta[2,d], ..., alpha[m], beta[m,1], beta[m,2], ..., beta[m,d]). |
info_mat |
estimated information matrix. |
negldl |
negative log dual empirical likelihood evaluated at mele. |
mele_null |
mele of the parameters under the null hypothesis, if available. |
negldl_null |
negative log dual empirical likelihood evaluated at mele under the null hypothesis, if available. |
delr |
the value of the dual empirical likelihood ratio statistic. If no null hypotheis (g_null) is given, this is simply -2*negldl. |
df |
degrees of freedom of the chi-square limiting distribution for DELR statistic under the null. |
p_val |
p-value of the DELR test. |
p_est |
estimated dF_k(x)'s at the observed data points, under the DRM. This is a data frame with the following three columns: k: label for the populations, k = 0, 1, ..., m. x: data points; at which 'x' value dF_k(x) is estimated. p_est: estimated dF_k(x). NOTE: To estimate the density of F_k(x),
it is recommended to use |
cdf_est |
estimated CDFs, F_k(x)'s, at the observed data points, under the DRM. This is a data frame with the following three columns: k: label for the populations, k = 0, 1, ..., m. x: data points; at which 'x' value F_k(x) is estimated. cdf_est: estimated F_k(x). NOTE: To estimate CDF dF_k(x), it is
recommended to use |
S. Cai, J. Chen and J. V. Zidek (2014), Hypothesis testing in the presence of multiple samples under density ratio models. Eprint, arXiv:1309.4740
A. Keziou and S. Leoni-Aubin (2008), On empirical likelihood for semiparametric two-sample density ratio models. Journal of Statistical Planning and Inference, 138:915-928.
# Data generation set.seed(25) n_samples <- c(100, 200, 180, 150, 175) # sample sizes x0 <- rgamma(n_samples[1], shape=5, rate=1.8) x1 <- rgamma(n_samples[2], shape=12, rate=1.2) x2 <- rgamma(n_samples[3], shape=12, rate=1.2) x3 <- rgamma(n_samples[4], shape=18, rate=5) x4 <- rgamma(n_samples[5], shape=25, rate=2.6) x <- c(x0, x1, x2, x3, x4) # Fit a DRM with the basis function q(x) = (x, log(abs(x))), # which is the basis function for gamma family. # There are 11 built-in basis function in drmdel(). And q(x) # = (x, log(abs(x))) is the 6th basis function, so we can # fit the model by specifying basis_func=6 in drmdel() as # follows: drmfit <- drmdel(x=x, n_samples=n_samples, basis_func=6) names(drmfit) # A brief summary of the DRM fit summaryDRM(drmfit) # Another way of specifying basis function for drmdel() is # to pass a user-specified R function to the basis_func # argument of the drmdel() function. # NOTE: If the basis function one wants to use is included # in the built-in function list, one should use the built-in # functions by passing an integer between 1 to 11 to the # drmdel() function, because the computation will be faster # with a built-in function than with a user-specified # function. basis_gamma <- function(x) return(c(x, log(abs(x)))) drmfit1 <- drmdel(x=x, n_samples=n_samples, basis_func=basis_gamma) # One can see the summary of this DRM fit is exactly the # same as that of the previous fit with basis_func=6 summaryDRM(drmfit1)