Search code examples
c#exceltext-filestext-parsingstring-parsing

Optimized parse text file, to then upload to Excel


My project is to take large amounts of logs, output inside text files and parse some of the data to be made into Excel format.

There is a lot of garbage data in between not needed

This is how one portion of it is like:

2018-05-17 07:16:57.105>>>>>>
{"status":"success", "statusCode":"0", "statusDesc":"Message Processed Sucessfully", "messageNumber":"451", "payload":{"messageCode":"SORTRESPONSE","Id":"5L","Id":"28032","messageTimestamp":"2018-05-16 23:16:55"}}

I will first need to take the time stamp befor the "{}" as it differs from the messageTimestamp

When generating the excel workbook This is how it will look like in Excel:

------A-----------------------------------B--------------C

1. Overall time stamp ---------- status------- statusCode 

2. 2018-05-17 07:16:57.105   - success --- -0

And so on...

payload has its own section of logs within its "{}"

so its section in excel will look like this:

  F

1. payload
2. {"messageCode":"SORTRESPONSE","Id":"5L","Id":"28032","messageTimestamp":"2018-05-16 23:16:55"}

its content can be in one section that's not an issue.

A friend of mine have done something similar but it can take a few minutes to even generate even one relatively small excel document

My Question:

What is the most optimal way can I parse the data needed to then store it in an array or multidimensional array

to then push it into an excel document.


Solution

  • I would try to split the input text on newline characters, then parse the JSON part with Newtonsoft.Json.

    I would highly advise to not try to parse the JSON yourself. The bottleneck here will be disk IO not in-memory processing, so make it easy to write correct code and use 3rd party libraries.

    Once you have structured data representing the input, you can write each entry to an output file with only the fields you need.

    For an Excel file, is CSV okay or do you need XLSX files? For CSV you can just write to a file directly, for XLSX I would recommend the EPPlus library.

    https://www.newtonsoft.com/json

    https://archive.codeplex.com/?p=epplus