Search code examples
javascriptstringalgorithmstring-comparison

Improve string - prefix matching performance


I'm looking for a way to speed up my naive string matching process:

// Treat this as pseudo code
function find(input: string, prefixes: string[]) {
  for (let i = 0; i < prefixes.length; i++) {
    const prefix = prefixes[i];
    
    if (input.startsWith(prefix)) {
      return prefix;
    }
  }

  return null;
}

const prefixes = [ "Hey", "Hi", "Hola", ... ];
const prefix = find("Hey, I'm Michael", prefixes);

I've looked into some probabilistic data structures like the bloom filter but I couldn't find one that'd fit my needs. This being said, I don't actually care to get the prefix that would have matched neither do I need a 100% guarantee that a match exists. I only need to know if the input does definitely NOT contain any prefix or that it might.

I've also come across an article about a Burst Tries algorithm which as far as I could understand will serve a similar purpose. Frankly, though I'm not deep enough into algorithms to grasp the full implementation details and make sure this is what I'm looking for.

Side note: I assume that 99.95% of the input this function will be getting is not going to match any prefix. Therefore I would like for this to be an optimization step to only process strings that will likely have a prefix I'm looking for.

Any help or suggestions would be very much appreciated :3


Solution

  • If the prefixes are known in advance and can be preprocessed, you might try a trie. Especially if they are going to be as short as 10 characters. That would mean each check is on the order of 10 comparisons. Not sure how much better one could do.

    function buildTrie(trie, words){
      for (let word of words){
        let _trie = trie;
    
        for (let i=0; i<word.length; i++){
          const letter = word[i];
          _trie[letter] = _trie[letter] || {};
    
          if (i == word.length - 1)
            _trie[letter]['leaf'] = true;
    
          _trie = _trie[letter];
        }
      }
    
      return trie;
    }
    
    
    function find(trie, str, i=0){
      const letter = str[i];
      
      if (!trie[letter])
        return false;
        
      if (trie[letter]['leaf'])
        return true;
        
      return find(trie[letter], str, i + 1);
    }
    
    
    const prefixes = [ "Hey", "Heya", "Hi", "Hola"];
    const trie = buildTrie({}, prefixes)
    
    console.log(trie)
    
    console.log(find(trie, "Hey, I'm Michael"));
    console.log(find(trie, "Heuy, I'm Michael"));