cvxclust {cvxclustr} | R Documentation |
cvxclust
estimates the convex clustering path via variable splitting methods: ADMM and AMA. This function
is a wrapper function that calls either cvxclust_path_admm
or cvxclust_path_ama
(the default) to perform the computation.
Required inputs include a data matrix X
(rows are features; columns are samples), a vector of weights
w
, and a sequence of regularization parameters gamma
.
Two penalty norms are currently supported: 1-norm and 2-norm. Both ADMM and AMA admit acceleration schemes at little additional computation.
Acceleration is turned on by default.
cvxclust(X, w, gamma, method = "ama", nu = 1, tol = 0.001, max_iter = 10000, type = 2, accelerate = TRUE)
X |
The data matrix to be clustered. The rows are the features, and the columns are the samples. |
w |
A vector of nonnegative weights. The ith entry |
method |
Algorithm to use: "admm" or "ama" |
gamma |
A sequence of regularization parameters. |
nu |
A positive penalty parameter for quadratic deviation term. |
tol |
The convergence tolerance. |
max_iter |
The maximum number of iterations. |
type |
An integer indicating the norm used: 1 = 1-norm, 2 = 2-norm. |
accelerate |
If |
U
A list of centroid matrices.
V
A list of centroid difference matrices.
Lambda
A list of Lagrange multiplier matrices.
Eric C. Chi, Kenneth Lange
cvxclust_path_ama
and cvxclust_path_admm
for estimating the clustering path with AMA or ADMM.
kernel_weights
and knn_weights
compute useful weights.
To extract cluster assignments from the clustering path use create_adjacency
and find_clusters
.
## Clusterpaths for Mammal Dentition data(mammals) X <- as.matrix(mammals[,-1]) X <- t(scale(X,center=TRUE,scale=FALSE)) n <- ncol(X) ## Pick some weights and a sequence of regularization parameters. k <- 5 phi <- 0.5 w <- kernel_weights(X,phi) w <- knn_weights(w,k,n) gamma <- seq(0.0,43, length.out=100) ## Perform clustering sol <- cvxclust(X,w,gamma) ## Plot the cluster path library(ggplot2) svdX <- svd(X) pc <- svdX$u[,1:2,drop=FALSE] pc.df <- as.data.frame(t(pc)%*%X) nGamma <- sol$nGamma df.paths <- data.frame(x=c(),y=c(), group=c()) for (j in 1:nGamma) { pcs <- t(pc)%*%sol$U[[j]] x <- pcs[1,] y <- pcs[2,] df <- data.frame(x=pcs[1,], y=pcs[2,], group=1:n) df.paths <- rbind(df.paths,df) } X_data <- as.data.frame(t(X)%*%pc) colnames(X_data) <- c("x","y") X_data$Name <- mammals[,1] data_plot <- ggplot(data=df.paths,aes(x=x,y=y)) data_plot <- data_plot + geom_path(aes(group=group),colour='grey30',alpha=0.5) data_plot <- data_plot + geom_text(data=X_data,aes(x=x,y=y,label=Name), position=position_jitter(h=0.125,w=0.125)) data_plot <- data_plot + geom_point(data=X_data,aes(x=x,y=y),size=1.5) data_plot <- data_plot + xlab('Principal Component 1') + ylab('Principal Component 2') data_plot + theme_bw() ## Output Cluster Assignment at 10th gamma A <- create_adjacency(sol$V[[10]],w,n) find_clusters(A) ## Visualize Cluster Assignment G <- graph.adjacency(A, mode = 'upper') plot(G,vertex.label=as.character(mammals[,1]),vertex.label.cex=0.65,vertex.label.font=2)