Search code examples
pythonplotlydendrogram

how to set distfun of create_dendrogram in plotly.figure_factory


I am meeting some difficulties when drawing a dendrogram by create_dendrogram in plotly.figure_factory.

the default linkagefun (linkagefun) is complete and the default setting of distance function (distfun) is scs.distance.pdist

but the setting I want is jaccard for distfun, and average for linkagefun: the setting I want shows below:

import pandas as pd
import numpy as np
from scipy.spatial.distance import pdist
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as such

plt.figure(figsize = (10, 10))
disMat = sch.distance.pdist(df, metric='jaccard')
disMat1 = sch.distance.squareform(disMat)
Z=sch.linkage(disMat1,method='average')
Dend=sch.dendrogram(Z,orientation='right')
plt.tick_params(
        axis='y',          
        which='both',      
        direction='in',   
        left=False,      
        right=False,         
        labelleft=False)

I noticed that the linkagefun could be set by linkagefun=lambda x: sch.linkage(x, 'average'), but the distfun can't be set by distfun='jaccard', and I have no idea of how to set this function.


fig = create_dendrogram(df, orientation='left',
                        labels=df.index,
                         distfun='jaccard',
                         linkagefun=lambda x: sch.linkage(x, 'average'))
fig.show()

the example of the df set below:

import pandas as pd
df = pd.DataFrame({'1-7':[0,0,1,1,0,1,1],'1-2':[1,0,1,0,0,1,1],'2-3':[1,0,0,0,1,1,0],'2-2':[0,1,0,1,0,1,1],'1-1':[1,0,0,1,0,1,0],'1-3':[0,1,1,1,0,0,0],'1-5':[0,1,0,1,1,0,1]},index=['a','b','c','d','e','f','g'])

since I need Dash to plot the figure on the web page, it seems I have to use create_dendrogram in plotly.


Solution

  • You can use partial from functools to "freeze" the parameter of scipy.spatial.distance.pdist that specifies the distance metric.

        from functools import partial
        from scipy.spatial.distance import pdist
        pw_jaccard_func = partial(pdist, metric='jaccard')
    

    Then use the partial function as the input for distfun:

    fig = create_dendrogram(df, orientation='left',
                            labels=df.index,
                            distfun=pw_jaccard_func ,
                            linkagefun=lambda x: sch.linkage(x, 'average'))