Search code examples
c#stringsearchstringbuilderculture

Using StringComparer with StringBuilder to search for a string


I need to use globalization rules to search for all occurrences of a string within a document. The pseudocode is:

var searchText = "Hello, World";
var compareInfo = new CultureInfo("en-US").CompareInfo;

DocumentIterator start = null; // the start position if a match occurs
var sb = new StringBuilder();

// the document is not a string, but exposes an iterator to its content
for (var iter = doc.Start(); iter.IsValid(); ++iter)
{
    start = start ?? iter; // the start of the potential match

    var ch = iter.GetChar(); 
    sb.Append(ch);

    if (compareInfo.Compare(searchText, sb.ToString()) == 0) // exact match
    {
        Console.WriteLine($"match at {start}-{iter}");
        // not shown: continue to search for more occurrences.
    }
    else if (!compareInfo.IsPrefix(criteria.Text, sb.ToString()))
    {
        // restart the search from the character immediately following start
        sb.Clear();
        iter = start; // this gets incremented immediately
        start = null;
    }
}

This delegates to CompareInfo the difficult job of culture-sensitive string matching.

However, the stream-like process implemented by the code has performance issues because it calls StringBuilder.ToString() in every iteration, thus defeating the performance benefit of StringBuilder.

Question: How can I do this search efficiently?


Solution

  • So why not copy the whole document to a stringbuilder first, the use 1 ToString(). Then just use a similar scheme to iterate over all the possible values. Use compareInfo.Compare(criteria.Text, 0, criteria.Text.Length, docString, startIndex, checkLength)