synth {ClickClust} | R Documentation |
The data represents the synthetic dataset used as an
illustrative example in the Journal of Statistical Software paper
discussing the use of the package.
There are 5 states denoted as A
, B
, C
, D
, and E
. Categorical sequences have lengths varying from 10 to 50.
data(synth)
$data contains a vector of 250 strings representing categorical sequences; $id is the original classification vector.
Melnykov, V. (2015)
Melnykov, V. (2016) Model-Based Biclustering of Clickstream Data, Computational Statistics and Data Analysis, 93, 31-45.
Melnykov, V. (2016) ClickClust: An R Package for Model-Based Clustering of Categorical Sequences, Journal of Statistical Software, 74, 1-34.
click.read
data(synth) head(synth$data) # FUNCTION THAT REPLACES CHARACTER STATES WITH NUMERIC VALUES repl.levs <- function(x, ch.lev){ for (j in 1:length(ch.lev)) x <- gsub(ch.levs[j], j, x) return(x) } # DETECT ALL STATES IN THE DATASET d <- paste(synth$data, collapse = " ") d <- strsplit(d, " ")[[1]] ch.levs <- levels(as.factor(d)) # CONVERT DATA TO THE FORM USED BY click.read() S <- strsplit(synth$data, " ") S <- sapply(S, repl.levs, ch.levs) S <- sapply(S, as.numeric) head(S)