KmeansPP {clusternor} | R Documentation |
A parallel and scalable implementation of the algorithm described in Ostrovsky, Rafail, et al. "The effectiveness of Lloyd-type methods for the k-means problem." Journal of the ACM (JACM) 59.6 (2012): 28.
KmeansPP(data, centers, nrow = -1, ncol = -1, nstart = 1, nthread = -1, dist.type = c("sqeucl", "eucl", "cos", "taxi"))
data |
Data file name on disk (NUMA optimized) or In memory data matrix |
centers |
The number of centers (i.e., k) |
nrow |
The number of samples in the dataset |
ncol |
The number of features in the dataset |
nstart |
The number of iterations of kmeans++ to run |
nthread |
The number of parallel threads to run |
dist.type |
What dissimilarity metric to use c("taxi", "eucl", "cos") |
A list containing the attributes of the output. cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. centers: A matrix of cluster centres. size: The number of points in each cluster. energy: The sum of distances for each sample from it's closest cluster. best.start: The sum of distances for each sample from it's closest cluster.
Disa Mhembere <disa@cs.jhu.edu>
iris.mat <- as.matrix(iris[,1:4]) k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes nstart <- 3 km <- KmeansPP(iris.mat, k, nstart=nstart)