Search code examples
c#performancesearchiofull-text-search

Searching large text file


I am trying to optimize the search for a string in a large text file (300-600mb). Using my current method, it is taking too long.

Currently I have been using IndexOf to search for the string, but the time it takes is way too long (20s) to build an index for each line with the string.

How can I optimize searching speed? I've tried Contains() but that is slow as well. Any suggestions? I was thinking regex match but I don't see that having a significant speed boost. Maybe my search logic is flawed

example

while ((line = myStream.ReadLine()) != null)
{
    if (line.IndexOf(CompareString, StringComparison.OrdinalIgnoreCase) >= 0)
    {
        LineIndex.Add(CurrentPosition);
        LinesCounted += 1;
    }
}

Solution

  • The brute force algorithm you're using performs in O(nm) time, where n is the length of the string being searched and m the length of the substring/pattern you're trying to find. You need to use a string search algorithm:

    However, using a regular expression crafted with care might be sufficient, depending on what you are trying to find. See Jeffrey's Friedl's tome, Mastering Regular Expressions for help on building efficient regular expressions (e.g., no backtracking).

    You might also want to consult a good algorithms text. I'm partial to Robert Sedgewick's Algorithms in its various incarnations (Algorithms in [C|C++|Java])