Search code examples
javascriptregexlodash

Javascript / lodash filtering an array very slow


I've got a large array of objects. I'm creating a second array based on filtering the first array by one of the properties of that object. My code is...

let re = RegExp("^" + term, "i");
this.filteredList = _.filter(this.list, item => item.value.search(re) > -1);

Oh... I'm using lodash at the moment but the native Javascript filter was being just as slow. Is there a more efficient way to filter the array?


Solution

  • I'm going to be the contrarian here and say that you're not going to do better than a regular expression here, and I have the benchmarks to back it up.

    I'm not going to talk about lodash, lodash isn't doing anything that will change the performance characteristics much.

    TL;DR

    You can eke out a little more performance by swapping out item.value.search(re) > -1 for re.test(item.value). You get the same result, but with a little less work. With that (and without lodash), your code looks like this:

    let re = RegExp('^' + term, 'i');
    this.filteredList = this.list.filter(item => re.test(item.value));
    

    ...and that's about the best you'll do.

    The benchmarks

    I tried four different methods:

    1. toLowerCase().startsWith, i.e. acontell's (original) solution:

      var termLower = term.toLowerCase();
      var filteredList = list.filter(
        item => item.toLowerCase().startsWith(termLower));
      
    2. slice().toLowerCase equality, i.e. trincot's solution:

      var termLower = term.toLowerCase();
      var filteredList = list.filter(
        item => item.substr(0, termLower.length).toLowerCase() === termLower);
      
    3. RegExp#test (the winner):

      var expr = new RegExp('^' + term, 'i');
      var filteredList = list.filter(item => expr.test(item));
      
    4. String#search (your original solution):

      var expr = new RegExp('^' + term, 'i');
      var filteredList = list.filter(item => item.search(expr));
      

    You can see my benchmarks here: https://jsperf.com/array-of-strings-prefix-search#10. Note the #10 at the end of the URL. Change the number to change the length of the strings (in words) in list. The setup code builds an array (searchSpace) of 100 sentences of the given length and an array of 100 search prefixes.

    I ran the benchmark for sentence lengths from 1 to 500 in Chrome 64 (macOS 10.12). Here are the results:

    prefix search benchmark results

    Unsurprisingly, toLowerCase().startsWith gets asymptotically worse the longer the sentences get, since it has to lowercase the whole sentence each time.

    Likewise slice().toLowerCase equality stays the same after its drop between 1 and 2, since it's lowercasing just one word no matter what.

    And then there's RegExp#test and String#search, which do basically the same thing except for the small amount of extra work for the latter. Modern JavaScript engines are very good at compiling regular expressions, especially very simple ones like /^keyword/. In the end, searching with a regular expression is much faster, since we don't have to do any lowercasing or slicing.

    Postscript

    Every benchmark is limited, and it's also entirely possible mine is flawed in some significant way. I'll be happy to hear any thoughts on how to improve it.