Search code examples
pythonnumpycluster-analysisplotlydendrogram

Plotting a dendrogram using Plotly Python


I am trying to plot a dendrogram using Python, preferably using Plotly. I have a dataset containing a clustering of various objects. I can use this dataset to generate the required data or at least extrapolate. However, I don't understand what the input to the create_dendrogram actually is. The doc just says that it is a ndarray - Matrix of observations as array of arrays. I am familiar with Numpy ndarrays, but I would like to know what the array must contain.

More specifically, what is the significance of the value X[i][j]. It just seems to be a float between 0 and 1. I had looked at the Plotly API documentation for Python here - https://plot.ly/python/dendrogram/

import plotly.plotly as py
from plotly.tools import FigureFactory as FF

import numpy as np

X = np.random.rand(10, 10)
fig = FF.create_dendrogram(X, orientation='left', labels=names)
py.iplot(fig, filename='dendrogram_with_labels')

If there was an alternative and more intuitive way to get a dendrogram in Python I would also like to know that. I am new to this and any help would be appreciated. (Please let me know if I need to rephrase the question!)


Solution

  • You can pass a linkage function to the create_dendrogram function. For example:

    from scipy.cluster.hierarchy import linkage
    
    ...
    
    figure = FF.create_dendrogram(
        data_array, orientation='bottom', labels=id_label_list,
        linkagefun=lambda x: linkage(data_array, 'ward', metric='euclidean')
    )