kmeans_tidiers {broom} | R Documentation |
These methods summarize the results of k-means clustering into three
tidy forms. tidy
describes the center and size of each cluster,
augment
adds the cluster assignments to the original data, and
glance
summarizes the total within and between sum of squares
of the clustering.
## S3 method for class 'kmeans' tidy(x, col.names = paste0("x", 1:ncol(x$centers)), ...) ## S3 method for class 'kmeans' augment(x, data, ...) ## S3 method for class 'kmeans' glance(x, ...)
x |
kmeans object |
col.names |
The names to call each dimension of the data in |
... |
extra arguments, not used |
data |
Original data (required for |
All tidying methods return a data.frame
without rownames.
The structure depends on the method chosen.
tidy
returns one row per cluster, with one column for each
dimension in the data describing the center, followed by
size |
The size of each cluster |
withinss |
The within-cluster sum of squares |
cluster |
A factor describing the cluster from 1:k |
augment
returns the original data with one extra column:
.cluster |
The cluster assigned by the k-means algorithm |
glance
returns a one-row data.frame with the columns
totss |
The total sum of squares |
tot.withinss |
The total within-cluster sum of squares |
betweenss |
The total between-cluster sum of squares |
iter |
The numbr of (outer) iterations |
library(dplyr) library(ggplot2) set.seed(2014) centers <- data.frame(cluster=factor(1:3), size=c(100, 150, 50), x1=c(5, 0, -3), x2=c(-1, 1, -2)) points <- centers %>% group_by(cluster) %>% do(data.frame(x1=rnorm(.$size[1], .$x1[1]), x2=rnorm(.$size[1], .$x2[1]))) k <- kmeans(points %>% dplyr::select(x1, x2), 3) tidy(k) head(augment(k, points)) glance(k) ggplot(augment(k, points), aes(x1, x2)) + geom_point(aes(color = .cluster)) + geom_text(aes(label = cluster), data = tidy(k), size = 10)