Search code examples
bashcsvfileparsing

Parsing a CSV file using shell scripting


I have been trying to write script which will parse a CSV file and give the output in a specified format.

The input file is in the below format.

collectionBeginTime,ID,MU,hostname,Granularity,SampleInterval,suspectFlag,memCpuUsage,memUsedMemory,memMemoryCapacity,memRequestNum,memOnlineUserNum,memUsedLogDisk,memLogDiskCapacity,freeCPUUsage,freeMemory,freeLogDisk
2015-11-27 17:30:00-0500,NE=2106384,hwMEMPerformanceCollect,PG_172.16.169.70,900,900,0,24,7130,36153,0,1554,23026,157239,76,29023,134213
2015-11-27 17:30:00-0500,NE=2106386,hwMEMPerformanceCollect,PG_172.16.169.68,900,900,0,4,7481,36153,0,1594,22778,157239,96,28672,134461

Output is expected to be in the format (showing only a few of the output lines for the first line of the input):

collectionBeginTime   ,     hostname     ,     Parameters
2015-11-27 17:30:00-0500, PG_172.16.169.70, SampleInterval:900
2015-11-27 17:30:00-0500, PG_172.16.169.70, suspectFlag:0 

I need to print columns 1 and 4 for each line after the first, followed by the column name (from line 1 of the file), : and the column value for columns 6..NF (ignoring columns 2, 3, 5 altogether). A single input line generates many output lines.

The script I have written:

#!/bin/bash

FILENAME=$1

awk -F',' 'BEGIN{OFS=",";}  { if ( NR!=1 )print $1,$4,$6,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' < $FILENAME >> tmp.txt

echo "completed"

The script is running but showing all the parameters on the same line without its name. How do I fix it?


Solution

  • You capture the fields in line 1 for reuse. In the other lines, you iterate over fields 6..NF printing relevant data:

    awk -F',' 'NR == 1 { for (i = 6; i <= NF; i++) name[i] = $i
                         printf "%s, %s, %s\n", $1, $4, "Parameters"; next }
               { for (i = 6; i <= NF; i++) printf "%s, %s, %s:%s\n", $1, $4, name[i], $i; }'
    

    Untested code.