Search code examples
bashparsingawkwhile-loopline

read line by line with awk and parse variables


I have a script that read log files and parse the data to insert them to mysql table..

My script looks like

while read x;do
var=$(echo ${x}|cut -d+ -f1) 
var2=$(echo ${x}|cut -d_ -f3)
...
echo "$var,$var2,.." >> mysql.infile 
done<logfile

The Problem is that log files are thousands of lines and taking hours....

I read that awk is better, I tried, but don't know the syntax to parse the variables...

EDIT: inputs are structure firewall logs so they are pretty large files like

@timestamp $HOST reason="idle Timeout" source-address="x.x.x.x" source-port="19219" destination-address="x.x.x.x" destination-port="53" service-name="dns-udp" application="DNS"....

So I'm using a lot of grep for ~60 variables e.g

sourceaddress=$(echo ${x}|grep -P -o '.{0,0} 
source-address=\".{0,50}'|cut -d\" -f2)

if you think perl will be better I'm open to suggestions and maybe a hint how to script it...


Solution

  • To answer your question, I assume the following rules of the game:

    • each line contains various variables
    • each variable can be found by a different delimiter.

    This gives you the following awk script :

    awk 'BEGIN{OFS=","}
         { FS="+"; $0=$0; var=$1;
           FS="_"; $0=$0; var2=$3;
                   ...
           print var1,var2,... >> "mysql.infile"
         }' logfile
    

    It basically does the following :

    • set the output separator to ,
    • read line
    • set the field separator to +, re-parse the line ($0=$0) and determine the first variable
    • set the field separator to '_', re-parse the line ($0=$0) and determine the second variable
    • ... continue for all variables
    • print the line to the output file.