I have a log file from our friends at Microsoft in a very challenging format. The file is as follows:
I have an export of several of these files, and I want to quickly parse them in Terminal using GREP to find key events.
Sanitized Example:
CreationDate,UserIds,Operations,AuditData
2022-01-01T15:00:00.0000000Z,username@domain.com,FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:00:00"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""username@domain.com"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"
2022-01-01T15:01:15.0000000Z,username@domain.com,FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:01:15"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""username@domain.com"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"
2022-01-01T15:02:02.0000000Z,username@domain.com,FileViewed,"{""AppAccessContext"":{""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65""},""CreationTime"":""2022-01-01T15:02:02"",""Id"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Operation"":""FileViewed"",""OrganizationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""RecordType"":0,""UserType"":0,""Version"":0,""Workload"":""OneDrive"",""ClientIP"":""172.0.0.1"",""ObjectId"":""https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt"",""UserId"":""username@domain.com"",""CorrelationId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""EventSource"":""SharePoint"",""ItemType"":""File"",""ListId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""ListItemUniqueId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""Site"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""WebId"":""f6298547-d934-4c79-8bab-c5c394f31f65"",""SourceFileName"":""TextFile.txt"",""SourceRelativeUrl"":""Documents""}"
I am attempting to parse the files with a combination of the cut
and jq
in terminal, but I am struggling as the cut
command does not work well with a comma delimiter when the JSON field is riddled with them. I would change the file to a tab-delimited file, ideally I want to avoid that if possible as I'd like to quickly inspect the logs for key events on the fly and not have to open each one up and convert the format.
Where I am at:
grep FileViewed AnnoyingLogFile.csv | cut -d, -f 4 | jq .
Output:
"{"
"AppAccessContext"
":{"
"CorrelationId"
":"
"f6298547-d934-4c79-8bab-c5c394f31f65"
Output I want:
{
"AppAccessContext":
{
"CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65"
},
"CreationTime": "2022-01-01T15:00:00",
"Id": "f6298547-d934-4c79-8bab-c5c394f31f65",
"Operation": "FileViewed",
"OrganizationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
"RecordType": 0,
"UserType": 0,
"Version": 0,
"Workload": "OneDrive",
"ClientIP": "172.0.0.1",
"ObjectId": "https://websitebame-my.sharepoint.com/personal/user_directory/Documents/TextFile.txt",
"UserId": "username@domain.com",
"CorrelationId": "f6298547-d934-4c79-8bab-c5c394f31f65",
"EventSource": "SharePoint",
"ItemType": "File",
"ListId": "f6298547-d934-4c79-8bab-c5c394f31f65",
"ListItemUniqueId": "f6298547-d934-4c79-8bab-c5c394f31f65",
"Site": "f6298547-d934-4c79-8bab-c5c394f31f65",
"WebId": "f6298547-d934-4c79-8bab-c5c394f31f65",
"SourceFileName": "TextFile.txt",
"SourceRelativeUrl": "Documents"
}
...
I have already used an alternative to analyze these logs, but I wanted to throw this question here to see if it is achievable to parse in terminal using cut
, jq
, or any other commands.
Probably not the best, but working
grep FileViewed AnnoyingLogFile.csv | cut -d, -f 4- | sed -e 's/""/"/g' -e 's/^"//' -e 's/"$//' | jq .
First sed replace ""
with "
, the second remove "
at the beginning and third at the end
If json is not the last column, you can use rev
and cut it from the end and rev
it back