I'm using linkage
to generate an agglomerative hierarchical clustering for a dataset of around 5000 instances. I want to visualize the 'bottom' merges in the hierarchy, that is, the nodes close to the leaves with the smallest distance measures.
Unfortunately, the dendrogram
visualization prefers to show the 'top' nodes from the last merges in the algorithm. By default it shows the top 30 nodes, collapsing the bottom of the tree. I can change the P
value to show more nodes, but I would have to show all 5000+ to see the lowest levels of the clustering at which point the plot is no longer readable.
For example, starting from the linkage
documentation example
openExample('stats/CompareClusterAssignmentsToClustersExample')
run CompareClusterAssignmentsToClustersExample
dendrogram(Z, 'Orient', 'Left', 'Labels', species);
Produces a dendrogram with the top 30 nodes visible. The nodes with numerical labels are collapsing lower levels of the tree.
I can increase the number of visible nodes to include all leaves at expense of readability.
dendrogram(Z, size(Z,1), 'Orient', 'Left', 'Labels', species);
What I'd really like is a zoomed in version of above, like the example below, but showing the first 30 closest clusters.
I tried providing the function with the first 30 rows of Z
,
dendrogram(Z(1:30), 'Orient', 'Left');
but that throws an "Index exceeds matrix dimensions." error when one of the rows references a cluster in a row > 30.
I also tried using the dendrogram Reorder
property, but I am having difficulty finding a valid ordering that orders the clusters from closest to farthest.
%The Z matrix is in order from closest cluster to furthest,
% so I can use it to create an ordering
Y = reshape(Z(:, 1:2)', 1, [])
Y = Y(Y<151);
dendrogram(Z, 30, 'Orient', 'Left', 'Labels', species, 'Reorder', Y);
I get the error
In the requested ordering of the nodes, some data points belonging to the same leaf in the plot are separated by the points belonging to other leaves. Try to use a different ordering.
It may be the case that such an ordering is not possible if the entire tree is calculated because there would be branch crossings, but I'm hoping that there is a better ordering if I am only looking at a portion of the tree, and clusters at higher levels are not considered.
How can I improve my visualization to show the lowest level clusters in the dendrogram?
Emmm...like ylim()?
dendrogram(Z, size(Z,1), 'Orient', 'Left', 'Labels', species);
ylim(max(ylim())-[30,0]);
yields