Search code examples
c#stringcase-insensitivestring-comparison

C# matching two text files, case sensitive issue


What I have is two files, sourcecolumns.txt and destcolumns.txt. What I need to do is compare source to dest and if the dest doesn't contain the source value, write it out to a new file. The code below works except I have case sensitive issues like this:

source: CPI
dest: Cpi

These don't match because of captial letters, so I get incorrect outputs. Any help is always welcome!

string[] sourcelinestotal =
    File.ReadAllLines("C:\\testdirectory\\" + "sourcecolumns.txt");
string[] destlinestotal =
    File.ReadAllLines("C:\\testdirectory\\" + "destcolumns.txt");

foreach (string sline in sourcelinestotal)
{
    if (destlinestotal.Contains(sline))
    {
    }
    else
    {
        File.AppendAllText("C:\\testdirectory\\" + "missingcolumns.txt", sline);
    }
}

Solution

  • You could do this using an extension method for IEnumerable<string> like:

    public static class EnumerableExtensions
    {
        public static bool Contains( this IEnumerable<string> source, string value, StringComparison comparison )
        {
             if (source == null)
             {
                 return false; // nothing is a member of the empty set
             }
             return source.Any( s => string.Equals( s, value, comparison ) );
        }
    }
    

    then change

    if (destlinestotal.Contains( sline ))
    

    to

    if (destlinestotal.Contains( sline, StringComparison.OrdinalIgnoreCase ))
    

    However, if the sets are large and/or you are going to do this very often, the way you're going about it is very inefficient. Essentially, you're doing an O(n2) operation -- for each line in the source you compare it with, potentially, all lines in the destination. It would be better to create a HashSet from the destination columns with a case insenstivie comparer and then iterate through your source columns checking if each one exists in the HashSet of the destination columns. This would be an O(n) algorithm. note that Contains on the HashSet will use the comparer you provide in the constructor.

    string[] sourcelinestotal = 
        File.ReadAllLines("C:\\testdirectory\\" + "sourcecolumns.txt"); 
    HashSet<string> destlinestotal = 
                    new HashSet<string>(
                      File.ReadAllLines("C:\\testdirectory\\" + "destcolumns.txt"),
                      StringComparer.OrdinalIgnoreCase
                    );
    
    foreach (string sline in sourcelinestotal) 
    { 
        if (!destlinestotal.Contains(sline)) 
        { 
            File.AppendAllText("C:\\testdirectory\\" + "missingcolumns.txt", sline); 
        } 
    }
    

    In retrospect, I actually prefer this solution over simply writing your own case insensitive contains for IEnumerable<string> unless you need the method for something else. There's actually less code (of your own) to maintain by using the HashSet implementation.