Search code examples
c++regextr1

Using tr1::regex_search to match a big list of strings


I need to match any of a list of strings, and I'm wondering if I can just use a regular expression that is something like "item1|item2|item3|..." instead of just doing a separate strstr() for each string. But the list can be fairly large - up to 10000 items. Would a regex work well with that? Would it be faster than searching for each string separately?


Solution

  • The regex will work and will certainly be faster than searching for each string. Though I'm not sure how much memory footprint or time will the initial setup take given the 10000 input patterns.

    However, this is a well-known problem and there is a lot of specific algorithms, for example:

    and several others. They all have different trade-offs, so pick your poison.

    In our project we needed the multiple replace solution, so we've chosen the Aho-Corasick algorithm and have built the replacing function upon it.