sim_unif {clusteval} | R Documentation |
We generate n
observations from each of four
trivariate distributions such that the Euclidean distance
between each of the populations is a fixed constant,
delta
> 0.
sim_unif(n = rep(25, 5), delta = 0, seed = NULL)
n |
a vector (of length M = 5) of the sample sizes for each population |
delta |
the fixed distance between each population and the origin |
seed |
Seed for random number generation. (If NULL, does not set seed) |
To define the populations, let x = (X_1, …, X_p)' be a multivariate uniformly distributed random vector such that X_j \sim U(a_j, b_j) is an independently distributed uniform random variable with a_j < b_j for j = 1, …, p. Let Pi_m denote the mth population (m = 1, …, 5). Then, we have the five populations:
Π_1 = U(-1/2, 1/2) \times U(Δ - 1/2, Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),
Π_2 = U(Δ - 1/2, Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),
Π_3 = U(-1/2, 1/2) \times U(-Δ - 1/2, -Δ + 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2),
Π_4 = U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-Δ - 1/2, -Δ + 1/2) \times U(-1/2, 1/2),
Π_5 = U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(-1/2, 1/2) \times U(Δ - 1/2, Δ + 1/2).
We generate n_m observations from population Π_m.
For Δ = 0 and ρ_m = ρ, m = 1, …, M, the M populations are equal.
Notice that the support of each population is a unit hypercube with 4 features. Moreover, for Δ ≥ 1, the populations are mutually exclusive and entirely separated.
named list containing:
A matrix
whose rows are the observations generated and whose
columns are the p
features (variables)
A vector denoting the population from which the observation in each row was generated.
data_generated <- sim_unif(seed = 42) dim(data_generated$x) table(data_generated$y) data_generated2 <- sim_unif(n = 10 * seq_len(5), delta = 1.5) table(data_generated2$y) sample_means <- with(data_generated2, tapply(seq_along(y), y, function(i) { colMeans(x[i,]) })) (sample_means <- do.call(rbind, sample_means))