simulate_tables {FunChisq}R Documentation

Simulate Noisy Contingency Tables to Represent Diverse Discrete Patterns

Description

Generate random contingency tables representing various functional, non-functional, dependent, or independent patterns, without specifying a parametric model for the patterns.

Usage

simulate_tables(
  n=100, nrow=3, ncol=3,
  type = c("functional", "many.to.one",
           "discontinuous", "independent",
           "dependent.non.functional"),
  noise.model = c("house", "candle"), noise=0.0,
  n.tables=1,
  row.marginal=rep(1/nrow, nrow),
  col.marginal=rep(1/ncol, ncol)
)

Arguments

n

a positive integer specifying the sample size to be distributed in each table. For "functional", "many.to.one", and "discontinuous" tables, n must be no less than nrow. For "dependent.non.functional" tables, n must be no less than nrow*ncol. For "independent" tables, n must be a positive integer.

nrow

a positive integer specifying the number of rows in each table. The value must be no less than 2. For "many.to.one" tables, nrow must be no less than 3.

ncol

a positive integer specifying the number of columns in output table. ncol must be no less than 2.

type

a character string to specify the type of pattern underlying the table. The options are "functional" (default), "many.to.one", "discontinuous", "independent", and "dependent.non.functional". See Details.

noise.model

a character string indicating the noise model of either "house" for ordinal variables (Zhang et al., 2015) or "candle" for categorical variables. See add.noise for details.

noise

a numeric value between 0 and 1 specifying the noise level to be added to the table using function add.noise. The noise is applied along the rows of the table, except "independent" tables where noise is applied along both row and coloumn. See add.noise for details.

n.tables

a positive integer value specifying the number of tables to be generated.

row.marginal

a non-negative numeric vector of length nrow specifying row marginal probabilities. The vector is linearly scaled so that the sum is 1. The default is a uniform distribution. For "many.to.one" tables, the length of row.marginal vector must be no less than 3.

col.marginal

a non-negative numeric vector of length ncol specifying column marginal probabilities. The vector is linearly scaled so that the sum is 1. The vector is used only in generating "independent" tables. The default is a uniform distribution.

Details

This function generates five types of table representing different interaction patterns between row and column discrete random variables X and Y. Three of the five types are non-constant functional patterns (Y is a non-constant function of X):

type="functional": Y is a function of X but X may or may not be a function of Y. The samples are distributed using the given row marginal probabilities.

type="many.to.one": Y is a many-to-one function of X but X is not a function of Y. The samples are distributed on the basis of row probabilities.

type="discontinuous": Y is a function of X, where the function value of X must differ from its neighbors. X may or may not be a function of Y. The samples are distributed using the given row marginal probabilities. A discontinuous function forms a contrast with those that are close to constant functions.

The fourth type "dependent.non.functional" is non-functional patterns where X and Y are statistically dependent but not function of each other.

The fifth type "independent" represents patterns where X and Y are statistically independent whose joint probability mass function is the product of their marginal probability mass functions.

Random noise can be optionally applied to the tables using either the house or the candle noise model. See add.noise for details.

The paper by Sharma et al. (2017) provides full mathematical and statistical details of the simulation strategies for the above table types except the "discontinuous" type.

Value

A list containing the following components:

pattern.list

a list of tables containing binary patterns in 0's and 1's. Each table is created by setting all non-zero entries in the corresponding sampled contingency table from sample.list to 1. Each table strictly satisfies the mathematical relationship required for a given pattern type requested, but it does not meet the statistical requirements. As each table represents the truth regarding the mathematical relationship between the row and column variables, they can be used as the ground truth or gold standard for benchmarking.

sample.list

a list of tables satisfying both the mathematical and statistical requirements. These tables are noise free.

noise.list

a list of tables after applying noise to the corresponding tables in sample.list. Each table is the noisy version of the corresponding sampled contingency table. Due to the added noise, each table may no longer strictly satisfy the required mathematical or statistical relationships. These tables are the main output to be used for the evaluation of a discrete pattern discovery algorithm.

pvalue.list

a list of p-values reporting the statistical significance of the generated tables for the required type. When the pattern type specifies a functional relationship, the p-values are computed by the functional chi-square test (Zhang and Song, 2013); otherwise, the Pearson's chi-square test of independence is used to calculate the p-value.

Author(s)

Ruby Sharma, Sajal Kumar, Hua Zhong and Joe Song

References

Sharma, R., Kumar, S., Zhong, H. and Song, M. (2017) Simulating noisy, nonparametric, and multivariate discrete patterns. The R Journal 9(2), 366–377. Retrieved from https://journal.r-project.org/archive/2017/RJ-2017-053/index.html

Zhang, Y., Liu, Z. L. and Song, M. (2015) ChiNet uncovers rewired transcription subnetworks in tolerant yeast for advanced biofuels conversion. Nucleic Acids Research 43(9), 4393–4407. Retrieved from https://nar.oxfordjournals.org/content/43/9/4393.long

Zhang, Y. and Song, M. (2013) Deciphering interactions in causal networks without parametric assumptions. arXiv Molecular Networks, arXiv:1311.2707, https://arxiv.org/abs/1311.2707

See Also

add.noise for details of the noise model.

Examples

## Not run: 
# In all examples, x is the row variable and y is the column
#    variable of a table.

# Example 1. Simulating a noisy function where y=f(x),
#            x may or may not be g(y)

tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="functional",
                noise=0.2, n.tables = 1,
                row.marginal = c(0.3,0.2,0.3,0.2))

par(mfrow=c(2,2))
plot_table(tbls$pattern.list[[1]], main="Ex 1. Functional pattern")
plot_table(tbls$sample.list[[1]], main="Ex 1. Sampled pattern (noise free)")
plot_table(tbls$noise.list[[1]], main="Ex 1. Sampled pattern with 0.2 noise")
plot.new()

# Example 2. Simulating a noisy functional pattern where
#            y=f(x), x may or may not be g(y)

tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="functional",
                noise=0.5, n.tables = 1,
                row.marginal = c(0.3,0.2,0.3,0.2))

par(mfrow=c(2,2))
plot_table(tbls$pattern.list[[1]], main="Ex 2. Functioal pattern", col="seagreen2")
plot_table(tbls$sample.list[[1]], main="Ex 2. Sampled pattern (noise free)", col="seagreen2")
plot_table(tbls$noise.list[[1]], main="Ex 2. Sampled pattern with 0.5 noise", col="seagreen2")
plot.new()


# Example 3. Simulating a noise free many.to.one function where
#            y=f(x), x!=f(y).

tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="many.to.one",
                noise=0.2, n.tables = 1,
                row.marginal = c(0.4,0.3,0.1,0.2))
par(mfrow=c(2,2))
plot_table(tbls$pattern.list[[1]], main="Ex 3. Many-to-one pattern", col="limegreen")
plot_table(tbls$sample.list[[1]], main="Ex 3. Sampled pattern (noise free)", col="limegreen")
plot_table(tbls$noise.list[[1]], main="Ex 3. Sampled pattern with 0.2 noise", col="limegreen")
plot.new()

# Example 4. Simulating noise-free discontinuous
#   pattern where y=f(x), x may or may not be g(y)

tbls <- simulate_tables(n=100, nrow=4, ncol=5,
                type="discontinuous", noise=0.2,
                n.tables = 1, row.marginal = c(0.2,0.4,0.2,0.2))

par(mfrow=c(2,2))
plot_table(tbls$pattern.list[[1]], main="Ex 4. Discontinuous pattern", col="springgreen3")
plot_table(tbls$sample.list[[1]], main="Ex 4. Sampled pattern (noise free)", col="springgreen3")
plot_table(tbls$noise.list[[1]], main="Ex 4. Sampled pattern with 0.2 noise", col="springgreen3")
plot.new()


# Example 5. Simulating noise-free dependent.non.functional
#            pattern where y!=f(x) and x and y are statistically
#            dependent.

tbls <- simulate_tables(n=100, nrow=4, ncol=5,
                type="dependent.non.functional", noise=0.3,
                n.tables = 1, row.marginal = c(0.2,0.4,0.2,0.2))

par(mfrow=c(2,2))
plot_table(tbls$pattern.list[[1]], main="Ex 5. Dependent.non.functional pattern",
col="sienna2", highlight="none")
plot_table(tbls$sample.list[[1]], main="Ex 5. Sampled pattern (noise free)",
col="sienna2", highlight="none")
plot_table(tbls$noise.list[[1]], main="Ex 5. Sampled pattern with 0.3 noise",
col="sienna2", highlight="none")
plot.new()

# Example 6. Simulating a pattern where x and y are
#            statistically independent.

tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="independent",
                noise=0.3, n.tables = 1,
                row.marginal = c(0.4,0.3,0.1,0.2),
                col.marginal = c(0.1,0.2,0.4,0.2,0.1))

par(mfrow=c(2,2))
plot_table(tbls$pattern.list[[1]], main="Ex 6. Independent pattern",
col="cornflowerblue", highlight="none")
plot_table(tbls$sample.list[[1]], main="Ex 6. Sampled pattern (noise free)",
col="cornflowerblue", highlight="none")
plot_table(tbls$noise.list[[1]], main="Ex 6. Sampled pattern with 0.3 noise",
col="cornflowerblue", highlight="none")
plot.new()


## End(Not run)


[Package FunChisq version 2.4.8-1 Index]