Search code examples
pythonmatrixblockcorrelation

Create clusters using correlation matrix in Python


all, I have a correlation matrix of 21 industry sectors. Now I want to split these 21 sectors into 4 or 5 groups, with sectors of similar behaviors grouped together.

Can experts shed me some lights on how to do this in Python please? Thanks much in advance!


Solution

  • UPDATE: This answer is wrong, and your clustering will not work correctly. Do not use it and read the explanation in Martijn Courteaux's answer below.


    You might explore the use of Pandas DataFrame.corr and the scipy.cluster Hierarchical Clustering package

    import pandas as pd
    import scipy.cluster.hierarchy as spc
    
    
    df = pd.DataFrame(my_data)
    corr = df.corr().values
    
    pdist = spc.distance.pdist(corr)
    linkage = spc.linkage(pdist, method='complete')
    idx = spc.fcluster(linkage, 0.5 * pdist.max(), 'distance')