Kmeans {knor} | R Documentation |
K-means provides k disjoint sets for a dataset using a parallel and fast NUMA optimized version of Lloyd's algorithm. The details of which are found in this paper https://arxiv.org/pdf/1606.08905.pdf.
Kmeans(data, centers, nrow = -1, ncol = -1, iter.max = .Machine$integer.max, nthread = -1, init = c("kmeanspp", "random", "forgy", "none"), tolerance = 1e-06, dist.type = c("eucl", "cos"), omp = FALSE, numa.opt = FALSE)
data |
Data file name on disk or In memory data matrix |
centers |
Either (i) The number of centers (i.e., k), or (ii) an In-memory data matrix, or (iii) A 2-Element list with element 1 being a filename for precomputed centers, and element 2 the number of centroids. |
nrow |
The number of samples in the dataset |
ncol |
The number of features in the dataset |
iter.max |
The maximum number of iteration of k-means to perform |
nthread |
The number of parallel thread to run |
init |
The type of initialization to use c("kmeanspp", "random", "forgy", "none") |
tolerance |
The convergence tolerance |
dist.type |
What dissimilarity metric to use |
omp |
Use (slower) OpenMP threads rather than pthreads |
numa.opt |
When passing data as an in-memory data matrix you can optimize memory placement for Linux NUMA machines. NOTE: performance may degrade with very large data & it requires 2*memory of that without this. |
A list containing the attributes of the output of kmeans. cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. centers: A matrix of cluster centres. size: The number of points in each cluster. iter: The number of (outer) iterations.
Disa Mhembere <disa@jhu.edu>
iris.mat <- as.matrix(iris[,1:4]) k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes kms <- Kmeans(iris.mat, k)