Search code examples
c#winformslinqcollectionsstring-search

Fastest way to search in a string collection


Problem:

I have a text file of around 120,000 users (strings) which I would like to store in a collection and later to perform a search on that collection.

The search method will occur every time the user change the text of a TextBox and the result should be the strings that contain the text in TextBox.

I don't have to change the list, just pull the results and put them in a ListBox.

What I've tried so far:

I tried with two different collections/containers, which I'm dumping the string entries from an external text file (once, of course):

  1. List<string> allUsers;
  2. HashSet<string> allUsers;

With the following LINQ query:

allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();

My search event (fires when user change the search text):

private void textBox_search_TextChanged(object sender, EventArgs e)
{
    if (textBox_search.Text.Length > 2)
    {
        listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
    }
    else
    {
        listBox_choices.DataSource = null;
    }
}

Results:

Both gave me a poor response time (around 1-3 seconds between each key press).

Question:

Where do you think my bottleneck is? The collection I've used? The search method? Both?

How can I get better performance and more fluent functionality?


Solution

  • You could consider doing the filtering task on a background thread which would invoke a callback method when it's done, or simply restart filtering if input is changed.

    The general idea is to be able to use it like this:

    public partial class YourForm : Form
    {
        private readonly BackgroundWordFilter _filter;
    
        public YourForm()
        {
            InitializeComponent();
    
            // setup the background worker to return no more than 10 items,
            // and to set ListBox.DataSource when results are ready
    
            _filter = new BackgroundWordFilter
            (
                items: GetDictionaryItems(),
                maxItemsToMatch: 10,
                callback: results => 
                  this.Invoke(new Action(() => listBox_choices.DataSource = results))
            );
        }
    
        private void textBox_search_TextChanged(object sender, EventArgs e)
        {
            // this will update the background worker's "current entry"
            _filter.SetCurrentEntry(textBox_search.Text);
        }
    }
    

    A rough sketch would be something like:

    public class BackgroundWordFilter : IDisposable
    {
        private readonly List<string> _items;
        private readonly AutoResetEvent _signal = new AutoResetEvent(false);
        private readonly Thread _workerThread;
        private readonly int _maxItemsToMatch;
        private readonly Action<List<string>> _callback;
    
        private volatile bool _shouldRun = true;
        private volatile string _currentEntry = null;
    
        public BackgroundWordFilter(
            List<string> items,
            int maxItemsToMatch,
            Action<List<string>> callback)
        {
            _items = items;
            _callback = callback;
            _maxItemsToMatch = maxItemsToMatch;
    
            // start the long-lived backgroud thread
            _workerThread = new Thread(WorkerLoop)
            {
                IsBackground = true,
                Priority = ThreadPriority.BelowNormal
            };
    
            _workerThread.Start();
        }
    
        public void SetCurrentEntry(string currentEntry)
        {
            // set the current entry and signal the worker thread
            _currentEntry = currentEntry;
            _signal.Set();
        }
    
        void WorkerLoop()
        {
            while (_shouldRun)
            {
                // wait here until there is a new entry
                _signal.WaitOne();
                if (!_shouldRun)
                    return;
    
                var entry = _currentEntry;
                var results = new List<string>();
    
                // if there is nothing to process,
                // return an empty list
                if (string.IsNullOrEmpty(entry))
                {
                    _callback(results);
                    continue;
                }
    
                // do the search in a for-loop to 
                // allow early termination when current entry
                // is changed on a different thread
                foreach (var i in _items)
                {
                    // if matched, add to the list of results
                    if (i.Contains(entry))
                        results.Add(i);
    
                    // check if the current entry was updated in the meantime,
                    // or we found enough items
                    if (entry != _currentEntry || results.Count >= _maxItemsToMatch)
                        break;
                }
    
                if (entry == _currentEntry)
                    _callback(results);
            }
        }
    
        public void Dispose()
        {
            // we are using AutoResetEvent and a background thread
            // and therefore must dispose it explicitly
            Dispose(true);
        }
    
        private void Dispose(bool disposing)
        {
            if (!disposing)
                return;
    
            // shutdown the thread
            if (_workerThread.IsAlive)
            {
                _shouldRun = false;
                _currentEntry = null;
                _signal.Set();
                _workerThread.Join();
            }
    
            // if targetting .NET 3.5 or older, we have to
            // use the explicit IDisposable implementation
            (_signal as IDisposable).Dispose();
        }
    }
    

    Also, you should actually dispose the _filter instance when the parent Form is disposed. This means you should open and edit your Form's Dispose method (inside the YourForm.Designer.cs file) to look something like:

    // inside "xxxxxx.Designer.cs"
    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            if (_filter != null)
                _filter.Dispose();
    
            // this part is added by Visual Studio designer
            if (components != null)
                components.Dispose();
        }
    
        base.Dispose(disposing);
    }
    

    On my machine, it works pretty quickly, so you should test and profile this before going for a more complex solution.

    That being said, a "more complex solution" would possibly be to store the last couple of results in a dictionary, and then only filter them if it turns out that the new entry differs by only the first of last character.