Search code examples
c#loggingsplitlinq-group

Group Results by Value after Split


I have searched but have not found my answer. Disclaimer: I am brand new to C# but I have a task at work to create the following program: Read from existing Log Files, Parse them by Tab, Limit the results to a specific status (Process E-mail), Group by Division (i.e. Investment Bank), then calculate statistics for amount of conversions of emails by division, and print to a new log file.

Wanted to give a bit of background on the program itself prior to asking the question. I am currently at the point where I would like to group by Division, and cant figure out how to do it.

EDIT: original data:

Status          Division      Time          Run Time    Zip Files   Conversions Returned Files  Total E-Mails   
Process E-mail  Investment Bank  12:00 AM   42.8596599  1               0               1             1 
End Processing                   12:05 AM   44.0945784  0               0               0             0 
Process E-mail  Investment Bank  12:10 AM   42.7193253  2               1               0             1 
Process E-mail  Treasury         12:15 AM   4.6563394   1               0               2             2

Here is the code that I have up to this point:

static void Main()
{
    {

        List<string> list = new List<string>();
        using (StreamReader reader = new StreamReader(Settings.LogPath + "2012-3-10.log"))
        {
            string line;
            int i = 0;
            while ((line = reader.ReadLine()) != null)
            {
                list.Add(line);
                i++;

                string[] split = line.Split('\t');

                string processing = split[0];

                    if(processing.StartsWith("Process"))
                    {
                        string division = split[1];
                        int zipFiles;
                        int.TryParse(split[4], out zipFiles);
                        int conversions;
                        int.TryParse(split[5], out conversions);
                        int returnedFiles;
                        int.TryParse(split[5], out returnedFiles);
                        int totalEmails;
                        int.TryParse(split[5], out totalEmails);

So I have the program to the point where it will spit out something to the console like this:

Investment Bank
1
0
1
1

Treasury
1
0
2
2

Investment Bank
2
1
0
1

What I am looking to do now, is group by "Investment Bank", "Treasury", etc and then be able to calculate the totals.

The final log file will look like this:

Division         Zip Files Conversions Returned Files Total E-mails
Investment Bank   3            1             1              2
Treasury          1            0             2              2

Solution

  • The following code does what you need:

    string filename = @"D:\myfile.log";
    var statistics = File.ReadLines(filename)
        .Where(line => line.StartsWith("Process"))
        .Select(line => line.Split('\t'))
        .GroupBy(items => items[1])
        .Select(g =>
                new 
                    {
                        Division = g.Key,
                        ZipFiles = g.Sum(i => Int32.Parse(i[2])),
                        Conversions = g.Sum(i => Int32.Parse(i[3])),
                        ReturnedFiles = g.Sum(i => Int32.Parse(i[4])),
                        TotalEmails = g.Sum(i => Int32.Parse(i[5]))
                    });
    
    Console.Out.WriteLine("Division\tZip Files\tConversions\tReturned Files\tTotal E-mails");
    statistics
       .ToList()
       .ForEach(d => Console.WriteLine("{0}\t{1}\t{2}\t{3}\t{4}", 
               d.Division, 
               d.ZipFiles, 
               d.Conversions, 
               d.ReturnedFiles,  
               d.TotalEmails));
    

    It could be even shorter (though less clear) if not to mess with anonymous classes but work with arrays instead. Let me know if you are intrested in such code.