Search code examples
javascriptperformanceionic-frameworknlplevenshtein-distance

Speeding up Levenshtein distance calculation in Ionic app


  • What I'm doing: I'm developing a mobile dictionary app for a number of languages

  • How I'm doing it: Using ionic framework with combination of some angular and some pure js (imported from a working online dictionary site of the same languages)

  • The problem: Our search function is an approximate search that uses a Levenstein distance calculator to rank all entries in the dictionary with respect to the query form. When the dictionary has up to 1,500 words, this isn't a problem at all on phones, but when the dictionary has around 10,000 words, there is about a 5-8 second delay before results are shown, despite it being instantaneous on a web browser using "ionic serve". When I run firebug, the javascript that takes the longest time to process are the distance calculations, so my working assumption is that this is where I should start, but I'm open to any suggestions at all.

Here's the distance calculator:

/**
 * editDistance.js
 * 
 * A simple Levenshtein distance calculator, except weighted such
 * that insertions at the beginning and deletions at the end cost less.
 *
 * AUTHOR: Pat Littell
 * LAST UPDATED: 2015-05-16
 */

var distanceCalculator = {

insertionCost : 1.0,
deletionCost : 1.0,
insertionAtBeginningCost : 0.11,
deletionAtEndCost : 0.1,
substitutionCost : 1.0,

getEditDistance : function(a, b) {
  if(a.length === 0) return b.length; 
  if(b.length === 0) return a.length; 

  var matrix = [];
 // var currentInsertionCost, currentDeletionCost, currentSubstitutionCost = 0;

  // increment along the first column of each row
  var i;
  for(i = 0; i <= b.length; i++){
    matrix[i] = [i * this.insertionAtBeginningCost];
  }

  // increment each column in the first row
  var j;
  for(j = 0; j <= a.length; j++){
    matrix[0][j] = j;
  }

  // Fill in the rest of the matrix
  for(i = 1; i <= b.length; i++){
    for(j = 1; j <= a.length; j++){
        currentInsertionCost = matrix[i][j-1] + this.insertionCost;
        currentSubstitutionCost = matrix[i-1][j-1] + (b.charAt(i-1) != a.charAt(j-1) ? this.substitutionCost : 0);
        currentDeletionCost = matrix[i-1][j] + (j==a.length ? this.deletionAtEndCost : this.deletionCost);            
        matrix[i][j] = Math.min(currentSubstitutionCost, Math.min(currentInsertionCost, currentDeletionCost));

    }
  }

  return matrix[b.length][a.length];
},


// Given a query <a> and a series of targets <bs>, return the least distance to any target
getLeastEditDistance : function(a, bs) {
    var that = this;
    return Math.min.apply(null, bs.map(function(b) {
        return that.getEditDistance(a,b);
    }));
}
}

Solution

  • First of all, if you have a known dictionary you will get the fastest solution with something like a Levenshtein Automata, which will solve this in linear time to get all candidates. You can't beat this with a general purpose implementation.

    With that said, this implementation of levenshtein distance is a few times faster than yours.

    function distance(s, t) {
        if (s === t) {
            return 0;
        }
        var n = s.length, m = t.length;
        if (n === 0 || m === 0) {
            return n + m;
        }
        var x = 0, y, py, a, b, c, d, e, f, k;
    
        var p = new Array(n);
    
        for (y = 0; y < n;) {
            p[y] = ++y;
        }
    
        for (; (x + 3) < m; x += 4) {
            var tx0 = t.charCodeAt(x);
            var tx1 = t.charCodeAt(x + 1);
            var tx2 = t.charCodeAt(x + 2);
            var tx3 = t.charCodeAt(x + 3);
            a = x;
            b = x + 1;
            c = x + 2;
            d = x + 3;
            e = x + 4;
            for (y = 0; y < n; y++) {
                k = s.charCodeAt(y);
                py = p[y];
                if (py < a || b < a) {
                    a = (py > b ? b + 1 : py + 1);
                }
                else {
                    if (tx0 !== k) {
                        a++;
                    }
                }
    
                if (a < b || c < b) {
                    b = (a > c ? c + 1 : a + 1);
                }
                else {
                    if (tx1 !== k) {
                        b++;
                    }
                }
    
                if (b < c || d < c) {
                    c = (b > d ? d + 1 : b + 1);
                }
                else {
                    if (tx2 !== k) {
                        c++;
                    }
                }
    
                if (c < d || e < d) {
                    d = (c > e ? e + 1 : c + 1);
                }
                else {
                    if (tx3 !== k) {
                        d++;
                    }
                }
    
                p[y] = e = d;
                d = c;
                c = b;
                b = a;
                a = py;
            }
        }
    
        for (; x < m;) {
            tx0 = t.charCodeAt(x);
            a = x;
            b = ++x;
            for (y = 0; y < n; y++) {
                py = p[y];
                if (py < a || b < a) {
                    b = (py > b ? b + 1 : py + 1);
                }
                else {
                    if (tx0 !== s.charCodeAt(y)) {
                        b = a + 1;
                    }
                    else {
                        b = a;
                    }
                }
                p[y] = b;
                a = py;
            }
            f = b;
        }
    
        return f;
    }
    

    I would also not use map in getLeastEditDistance, it is very slow. Just use a normal loop. Also Math.min with many arguments is not very performant.