Search code examples
regexawksedgrepregular-language

How to output multiple regex matches through comma on the same line


I want to use grep/awk/sed to extract matched strings for each line of a log file. Then place it into csv file. Highlighted strings (1432,53,http://www.espn.com/)

If the input is:

2018-10-31 18:48:01.717,INFO,15592.15627,PfbProxy::handlePfbFetchDone(0x1d69850, pfbId=561, pid=15912, state=4, fd=78, timer=61), FETCH DONE: len=45, PFBId=561, pid=0, loadTime=1434 ms, objects=53, fetchReqEpoch=0.0, fetchDoneEpoch:0.0, fetchId=26, URL=http://www.espn.com/

2018-10-31 18:48:01.806,DEBUG,15592.15621,FETCH DONE: len=45, PFBId=82, pid=0, loadTime=1301 ms, objects=54, fetchReqEpoch=0.0, fetchDoneEpoch:0.0, fetchId=28, URL=http://www.diply.com/

Expected output for the above log lines:

URL,LoadTime,Objects
http://www.espn.com/,1434,53
http://www.diply.com/,1301,54

This is an example, and the actual Log File will have much more data.

--My-Solution-So-far-

For now I used grep to get all lines containing keyword 'FETCH DONE' (these lines contain strings I am looking for).

I did come up with regular expression that matches the data I need, but when I grep it and put it in the file it prints each string on the new line which is not quite what I am looking for. The grep and regular expression I use (online regex tool: https://regexr.com/42cah):

echo -en 'url,loadtime,object\n'>test1.csv #add header
grep -Po '(?<=loadTime=).{1,5}(?= )|((?<=URL=).*|\/(?=.))|((?<=objects=).{1,5}(?=\,))'>>test1.csv #get matching strings

Actual output:

URL,LoadTime,Objects
http://www.espn.com
1434
53 
http://www.diply.com
1301
54

Expected output:

URL,LoadTime,Objects
http://www.espn.com/,1434,53
http://www.diply.com/,1301,54

I was trying using awk to match multiple regex and print comma in between. I couldn't get it to work at all for some reason, even though my regex matches correct strings.

Another idea I have is to use sed to replace some '\n' for ',':

for(i=1;i<=n;i++)
    if(i % 3 != 0){
        sed REPLACE "\n" with "," on i-th line 
    }

Im pretty sure there is a more efficient way of doing it


Solution

  • Using sed:

    sed -n 's/.*loadTime=\([0-9]*\)[^,]*, objects=\([0-9]*\).* URL=\(.*\)/\3,\1,\2/p' input | \
      sed 1i'URL,LoadTime,Objects'