【Clustering】ConsensusClusterPlus package

The ConsensusClusterPlus packages areR languageA method for implementing consensus clustering in the

There are three main steps: 1, preparing the input data; 2, running the process; 3, generating consensus

1-Input data

Input data requirements are unspecialized and are listed as sample behavioral genes, normalized expression matrices.

It's worth noting that this package by default chooses to start with themedian absolute deviation (MAD) measure of top5000 highly variable genes was used for analysis to betterclusteringClustering (which is very similar to single cell). How many genes are selected and the selection method is optional, as this step uses CLASSICAL R statistics rather than the integrated commands in the package.




library(ALL)




data(ALL)




d=exprs(ALL)




d[1:5,1:5]




 



mads=apply(d,1,mad)




d=d[rev(order(mads))[1:5000],]




d = sweep(d,1, apply(d,1,median,=T))

2-Clustering

Several important parameters:

pItem: percent of items (column) resampling

pFeature: percent of features (rows) resampling

maxK: maxium cluster counts

reps: resampling times

clusterAlg: agglomerative hierarchical clustering algorithm

distance: 1- Pearson correlation distances

Note: In practice, K and reps can be set higher, e.g. 20, 1000.




library(ConsensusClusterPlus)




title=tempdir()




results = ConsensusClusterPlus(d,maxK=6,reps=50,pItem=0.8,pFeature=1,





+ title=title,clusterAlg="hc",distance="pearson",seed=1262118388.71279,plot="png")

The result is a list whose elements correspond to the results for different values of k




### View important results



 



#consensusMatrix - the consensus matrix.



#For .example, the top five rows and columns of results for k=2:



results[[2]][["consensusMatrix"]][1:5,1:5]




 



#consensusTree - hclust object



results[[2]][["consensusTree"]]




 



#consensusClass - the sample classifications



results[[2]][["consensusClass"]][1:5]




 



#ml - consensus matrix result



#clrs - colors for cluster

3-Computing cluster consensus vs. item consensus

These two concepts are analogous to intracluster heterogeneity and the concept of MEMBERSHIP in WGCNA.




icl = calcICL(results,title=title,plot="png")




 



icl[["clusterConsensus"]]




#k cluster clusterConsensus



#[1,] 2 1 0.7681668



#[2,] 2 2 0.9788274



#[3,] 3 1 0.6176820



#[4,] 3 2 0.9190744



#[5,] 3 3 1.0000000



#[6,] 4 1 0.8446083



 



icl[["itemConsensus"]][1:5,]




#k cluster item itemConsensus



#1 2 1 28031 0.6173782



#2 2 1 28023 0.5797202



#3 2 1 43012 0.5961974



#4 2 1 28042 0.5644619



#5 2 1 28047 0.6259350

4-Graphical presentation

for further details, refer toBioconductor - ConsensusClusterPlus

R Language|ConsensusClusterPlus Package for Consensus Clustering

In practice, CC clustering is often associated with some specific biological processes. For example, angiogenesis, hypoxia. First find the related gene set, take a subset of the expression matrix, and then cluster into subtypes. With different subtypes, the later analysis is very diverse, and can dig deeper into the molecular mechanisms of different, co-expression networks, and can also explore his diagnostic or prognostic value.

Or, first use cibersort, etc.arithmeticCalculate the immune infiltration MATRIX and use the immune infiltration results to do infiltration clustering.

This idea is not really much different from ssGSEA or GSVA using median gene set scoring to divide high and low groups, except that with GSVA it is more intuitive and the prognostic or diagnostic distinctions are more pronounced, but distance-based clustering may be more advantageous in terms of co-expression networks.