Search code examples
javaedit-distance

Edit distance in Java: How arrange the code?


I am working on a Java project about Edit Distance, i.e. the minimum number of operations (out of three operations that are defined, see here for more info!). I am completely new to Java, and it seems like a great object oriented language, but maybe less numerical oriented like Matlab for instance. The problem is I do not know what all the corresponding functions in Matlab or Python are in Java that would realize my solution for this project, so all I need is a little constructive help on how to this.

The code is below (don't worry, I don't expect anyone to understand the code/algorithm, but it works!)

CODE

import java.util.LinkedList;
import java.util.List;

public class ClosestWords {
  LinkedList<String> closestWords = null;
  int closestDistance = -1;

  int[][] partDist(String w1, String w2, int w1len, int w2len) {
      int[][] M = new int[w1len+1][w2len+1];
      for(int i=0;i<=w1len;i++) {
          for(int j=0;j<=w2len;j++) {
              if( i == 0) {
                  M[i][j] = j;
                  }
              else if(j==0) {
                  M[i][j] = i;
                  }
              else {
                  char a = w1.charAt(i-1);
                  char b = w2.charAt(j-1);
                  int I = (a == b ? 0:1);
                  M[i][j] = Math.min(Math.min(M[i-1][j]+1,M[i][j-1]+1),M[i-1][j-1]+I);
              }
          }
      }
  return M;
  }

  int[][] Distance(String w1, String w2) {
    return partDist(w1, w2, w1.length(), w2.length());
  }

  public ClosestWords(String w, List<String> wordList) {
      for (String s : wordList) {
          int[][] M = Distance(w, s);
          int dist = M[w.length()-1][s.length()-1];
          // int dist = Distance(w, s);
          // System.out.println("d(" + w + "," + s + ")=" + dist);
          if (dist < closestDistance || closestDistance == -1) {
              closestDistance = dist;
              closestWords = new LinkedList<String>();
              closestWords.add(s);
              }
          else if (dist == closestDistance)
              closestWords.add(s);
          }
      }

  int getMinDistance() {
    return closestDistance;
  }

  List<String> getClosestWords() {
    return closestWords;
  }
}

Now, what I would like to do (but I don'tknow how to do), is to update the matrix M inside the for loop in ClosestWords. In Matlab this would be easy: I'd just set the matrix to some initial form, then for every loop we would obtain a new matrix from the function call Distance(w, s). This new matrix, in turn, I would like to modify, that is, remove a number of the last rows from it. How do I do this? For example, I have a M matrix that is 4 by 4, then I remove the last row so I get M_new that is 3 by 4. Is it possible?

Also, if I have to strings of possible different lengths, how do I check (in the easiest way) how many of their first letters that are the same? That is, the maximum length of the substrings of the strings starting at the left and that are equal to one another? For example, compute and commute would have three first letters (starting from the left) in common, thus three of the first letters are the same.

Best regards,


Solution

  • Java is not very well suited for this type of work (APL comes to mind here). If this is not an exercise, I would use existing libairies to do this. If this is an exercise, I would check how an open source library does this.

    In the end, you could:

    1) Copy the original content into a newly allocated matrix of smaller size.

    2) Shift values in your current matrix and having external data to keep track of matrix's logical size.

    3) ...

    For your second question, I would add the words to a tree structure and find the longest sub-branch starting from the root having at least two sub-branches.

    Or simply sort alphabetically and compare every adjacent string.