I want to use grep/awk/sed to extract matched strings for each line of a log file. Then place it into csv file. Highlighted strings (1432,53,http://www.espn.com/)
If the input is:
2018-10-31 18:48:01.717,INFO,15592.15627,PfbProxy::handlePfbFetchDone(0x1d69850, pfbId=561, pid=15912, state=4, fd=78, timer=61), FETCH DONE: len=45, PFBId=561, pid=0, loadTime=1434 ms, objects=53, fetchReqEpoch=0.0, fetchDoneEpoch:0.0, fetchId=26, URL=http://www.espn.com/
2018-10-31 18:48:01.806,DEBUG,15592.15621,FETCH DONE: len=45, PFBId=82, pid=0, loadTime=1301 ms, objects=54, fetchReqEpoch=0.0, fetchDoneEpoch:0.0, fetchId=28, URL=http://www.diply.com/
Expected output for the above log lines:
URL,LoadTime,Objects
http://www.espn.com/,1434,53
http://www.diply.com/,1301,54
This is an example, and the actual Log File will have much more data.
--My-Solution-So-far-
For now I used grep to get all lines containing keyword 'FETCH DONE' (these lines contain strings I am looking for).
I did come up with regular expression that matches the data I need, but when I grep it and put it in the file it prints each string on the new line which is not quite what I am looking for. The grep and regular expression I use (online regex tool: https://regexr.com/42cah):
echo -en 'url,loadtime,object\n'>test1.csv #add header
grep -Po '(?<=loadTime=).{1,5}(?= )|((?<=URL=).*|\/(?=.))|((?<=objects=).{1,5}(?=\,))'>>test1.csv #get matching strings
Actual output:
URL,LoadTime,Objects
http://www.espn.com
1434
53
http://www.diply.com
1301
54
Expected output:
URL,LoadTime,Objects
http://www.espn.com/,1434,53
http://www.diply.com/,1301,54
I was trying using awk to match multiple regex and print comma in between. I couldn't get it to work at all for some reason, even though my regex matches correct strings.
Another idea I have is to use sed to replace some '\n' for ',':
for(i=1;i<=n;i++)
if(i % 3 != 0){
sed REPLACE "\n" with "," on i-th line
}
Im pretty sure there is a more efficient way of doing it
Using sed:
sed -n 's/.*loadTime=\([0-9]*\)[^,]*, objects=\([0-9]*\).* URL=\(.*\)/\3,\1,\2/p' input | \
sed 1i'URL,LoadTime,Objects'