Search code examples
c#stringlistc#-4.0linq-group

Count same string count out of massive string list


I have got over 600k lines of string. I want to group same strings and learn their counts.

So example

i go to school
i like music
i like games
i like music
i like music
i like games
i like music

So result will be

i go to school , 1
i like games  , 2
i like music , 4

How can I do that with the fastest possible way?


Solution

  • The GroupBy method is what you want. You'll need your strings to be in a list or something that implements IEnumerable<string>. The File.ReadLines suggested by spender will return an IEnumerable<string> that reads the file line by line.

    var stringGroups = File.ReadLines("filename.txt").GroupBy(s => s);
    foreach (var stringGroup in stringGroups)
        Console.WriteLine("{0} , {1}", stringGroup.Key, stringGroup.Count());
    

    If you want them in order of least to most (as in your example) just add an OrderBy

    ...
    foreach (var stringGroup in stringGroups.OrderBy(g => g.Count()))
        ...