Search code examples
cluster-analysiswekahierarchical-clustering

How does the link type "adjusted complete" work for agglomerative hierachical clustering in WEKA?


The only descriptions I can find about "adjusted complete" linkage say something like: "same as complete linkage, but with largest within cluster distance"

What is meant by "within cluster distance"?

How is the distance between two clusters finally calculated using this linkage approach?

Thanks for your replies!


Solution

  • One of the great things about open-source software is that you can find out exactly how the software works. The code below shows Weka's source code of the HierarchicalClusterer algorithm, more specifically it shows the part which implements the COMPLETE and ADJCOMPLETE functionality. The difference is as follows:

    • Just like the COMPLETE linkage method, compute the maximum distance between one node from cluster 1 and one node from cluster 2 and store this in fBestDist
    • Then, find the largest distance between nodes within cluster 1 or cluster 2 and store this in fMaxDist
    • Finally subtract fMaxDist from fBestDist

    So the distance between two clusters calculated using ADJCOMPLETE as linkType corresponds to the COMPLETE distance minus the largest distance between 2 nodes within either cluster 1 or cluster 2.

    Adjusted Complete-Link was proposed in the following paper:

    Sepandar Kamvar, Dan Klein and Christopher Manning (2002). Interpreting and Extending Classical Agglomerative Clustering Algorithms Using a Model-Based Approach. In Proceedings of 19th International Conference on Machine Learning (ICML-2002)

    According to it (section 4.2), Adjusted Complete-Link is a version of Complete-Link which should be used if the clusters having varying radii (see Figure 10).

    case COMPLETE:
    case ADJCOMLPETE:
      // find complete link distance aka maximum link, which is the largest distance between
      // any item in cluster1 and any item in cluster2
      fBestDist = 0;
      for (int i = 0; i < cluster1.size(); i++) {
        int i1 = cluster1.elementAt(i);
        for (int j = 0; j < cluster2.size(); j++) {
          int i2 = cluster2.elementAt(j);
          double fDist = fDistance[i1][i2];
          if (fBestDist < fDist) {
            fBestDist = fDist;
          }
        }
      }
      if (m_nLinkType == COMPLETE) {
        break;
      }
      // calculate adjustment, which is the largest within cluster distance
      double fMaxDist = 0;
      for (int i = 0; i < cluster1.size(); i++) {
        int i1 = cluster1.elementAt(i);
        for (int j = i+1; j < cluster1.size(); j++) {
          int i2 = cluster1.elementAt(j);
          double fDist = fDistance[i1][i2];
          if (fMaxDist < fDist) {
            fMaxDist = fDist;
          }
        }
      }
      for (int i = 0; i < cluster2.size(); i++) {
        int i1 = cluster2.elementAt(i);
        for (int j = i+1; j < cluster2.size(); j++) {
          int i2 = cluster2.elementAt(j);
          double fDist = fDistance[i1][i2];
          if (fMaxDist < fDist) {
            fMaxDist = fDist;
          }
        }
      }
      fBestDist -= fMaxDist;
      break;