The only descriptions I can find about "adjusted complete" linkage say something like: "same as complete linkage, but with largest within cluster distance"
What is meant by "within cluster distance"?
How is the distance between two clusters finally calculated using this linkage approach?
Thanks for your replies!
One of the great things about open-source software is that you can find out exactly how the software works. The code below shows Weka's source code of the HierarchicalClusterer
algorithm, more specifically it shows the part which implements the COMPLETE
and ADJCOMPLETE
functionality. The difference is as follows:
COMPLETE
linkage method, compute the maximum distance between one node from cluster 1 and one node from cluster 2 and store this in fBestDist
fMaxDist
fMaxDist
from fBestDist
So the distance between two clusters calculated using ADJCOMPLETE
as linkType
corresponds to the COMPLETE
distance minus the largest distance between 2 nodes within either cluster 1 or cluster 2.
Adjusted Complete-Link
was proposed in the following paper:
Sepandar Kamvar, Dan Klein and Christopher Manning (2002). Interpreting and Extending Classical Agglomerative Clustering Algorithms Using a Model-Based Approach. In Proceedings of 19th International Conference on Machine Learning (ICML-2002)
According to it (section 4.2), Adjusted Complete-Link
is a version of Complete-Link
which should be used if the clusters having varying radii (see Figure 10).
case COMPLETE:
case ADJCOMLPETE:
// find complete link distance aka maximum link, which is the largest distance between
// any item in cluster1 and any item in cluster2
fBestDist = 0;
for (int i = 0; i < cluster1.size(); i++) {
int i1 = cluster1.elementAt(i);
for (int j = 0; j < cluster2.size(); j++) {
int i2 = cluster2.elementAt(j);
double fDist = fDistance[i1][i2];
if (fBestDist < fDist) {
fBestDist = fDist;
}
}
}
if (m_nLinkType == COMPLETE) {
break;
}
// calculate adjustment, which is the largest within cluster distance
double fMaxDist = 0;
for (int i = 0; i < cluster1.size(); i++) {
int i1 = cluster1.elementAt(i);
for (int j = i+1; j < cluster1.size(); j++) {
int i2 = cluster1.elementAt(j);
double fDist = fDistance[i1][i2];
if (fMaxDist < fDist) {
fMaxDist = fDist;
}
}
}
for (int i = 0; i < cluster2.size(); i++) {
int i1 = cluster2.elementAt(i);
for (int j = i+1; j < cluster2.size(); j++) {
int i2 = cluster2.elementAt(j);
double fDist = fDistance[i1][i2];
if (fMaxDist < fDist) {
fMaxDist = fDist;
}
}
}
fBestDist -= fMaxDist;
break;