adjClust {adjclust} | R Documentation |
Adjacency-constrained hierarchical agglomerative clustering
adjClust(mat, type = c("similarity", "dissimilarity"), h = ncol(mat) - 1)
mat |
A similarity matrix or a dist object |
type |
Type of matrix : similarity or dissimilarity. Defaults to
|
h |
band width. It is assumed that the similarity between two items is 0
when these items are at a distance of more than band width h. Default value
is |
Adjacency-constrained hierarchical agglomerative clustering (HAC) is HAC in which each observation is associated to a position, and the clustering is constrained so as only adjacent clusters are merged. These methods are useful in various application fields, including ecology (Quaternary data) and bioinformatics (e.g., in Genome-Wide Association Studies (GWAS)).
This function is a fast implementation of the method that takes advantage of
sparse similarity matrices (i.e., that have 0 entries outside of a diagonal
band of width h
). The method is fully described in (Dehman, 2015) and
based on a kernel version of the algorithm. The different options for the
implementation are available in the package vignette entitled "Notes on CHAC
implementation in adjclust".
An object of class chac
which describes the tree
produced by the clustering process. The object a list with the same elements
as an object of class chac
(merge
, height
,
order
, labels
, call
, method
, dist.method
),
and an extra element mat
: the data on which the clustering is
performed, possibly after pre-transformations described in the vignette
entitled "Notes on CHAC implementation in adjclust".
Dehman A. (2015) Spatial Clustering of Linkage Disequilibrium Blocks for Genome-Wide Association Studies, PhD thesis, Universite Paris Saclay.
snpClust
to cluster SNPs based on linkage disequilibrium
hicClust
to cluster Hi-C data
sim <- matrix( c(1.0, 0.1, 0.2, 0.3, 0.1, 1.0 ,0.4 ,0.5, 0.2, 0.4, 1.0, 0.6, 0.3, 0.5, 0.6, 1.0), nrow = 4) ## similarity, full width fit1 <- adjClust(sim, "similarity") plot(fit1) ## similarity, h < p-1 fit2 <- adjClust(sim, "similarity", h = 2) plot(fit2) ## dissimilarity dist <- as.dist(sqrt(2-(2*sim))) ## dissimilarity, full width fit3 <- adjClust(dist, "dissimilarity") plot(fit3) ## dissimilarity, h < p-1 fit4 <- adjClust(dist, "dissimilarity", h = 2) plot(fit4)