PRclust {prclust} | R Documentation |
Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit. Prclust helps us peform penalized regression-based clustering with various loss functions and grouping penalities via two algorithm (DC-ADMM and quadratic penalty).
PRclust(data, lambda1, lambda2, tau, loss.method = c("quadratic","lasso"), grouping.penalty = c("gtlp","L1","SCAD","MCP"), algorithm = c("ADMM","Quadratic"), epsilon=0.001)
data |
input matrix, of dimension nvars x nobs; each column is an observation vector. |
lambda1 |
Tuning parameter or step size: lambda1, typically set at 1 for quadratic penalty based algorithm; 0.4 for revised ADMM. |
lambda2 |
Tuning parameter: lambda2, the magnitude of grouping penalty. |
tau |
Tuning parameter: tau, related to grouping penalty. |
loss.method |
The loss method. "lasso" stands for L_1 loss function, while "quadratic" stands for the quadratic loss function. |
grouping.penalty |
Grouping penalty. Character: may be abbreviated. "gtlp" means generalized group lasso is used for grouping penalty. "lasso" means lasso is used for grouping penalty. "SCAD" and "MCP" are two other non-convex penalty. |
algorithm |
character: may be abbreviated. The algorithm to use for finding the solution. The default algorithm is "ADMM", which stands for the new algorithm we developed. |
epsilon |
The stopping critetion parameter. The default is 0.001. |
Clustering analysis has been widely used in many fields. In the absence of a class label, clustering analysis is also called unsupervised learning. However, penalized regression-based clustering adopts a novel framework for clustering analysis by viewing it as a regression problem. In this method, a novel non-convex penalty for grouping pursuit was proposed which data-adaptively encourages the equality among some unknown subsets of parameter estimates. This new method can deal with some complex clustering situation, for example, in the presence of non-convex cluster, in which the K-means fails to work, PRclust might perform much better.
The return value is a list. In this list, it contains the following matrix.
mu |
The centroid of the each observations. |
theta |
The theta value for the data set, not very useful. |
group |
The group for each points. |
count |
The iteration times. |
Choosing tunning parameter is kind of time consuming job. It is always based on "trials and errors".
Chong Wu, Wei Pan
Pan, W., Shen, X., & Liu, B. (2013). Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. Journal of Machine Learning Research, 14(1), 1865-1889.
Wu, C., Kwon, S., Shen, X., & Pan, W. (2016). A New Algorithm and Theory for Penalized Regression-based Clustering. Journal of Machine Learning Research, 17(188), 1-25.
library("prclust") # To let you have a better understanding about the power and strength # of PRclust method, 6 examples in original prclust paper were provided. ################################################ ### case 1 ################################################ ## generate the data data = matrix(NA,2,100) data[1,1:50] = rnorm(50,0,0.33) data[2,1:50] = rnorm(50,0,0.33) data[1,51:100] = rnorm(50,1,0.33) data[2,51:100] = rnorm(50,1,0.33) ## set the tunning parameter lambda1 =1 lambda2 = 3 tau = 0.5 a =PRclust(data,lambda1,lambda2,tau) a