Search code examples
linuxbashshellawkautosys

Unwanted blank row in CSV file


I have a command here that i was able to put together with the help of Stackoverflow community. Now I have a small concern with the script, this is a very small issue but its bugging me.

Below is the script I use:

#!/bin/bash

  read -p "Enter /DIR/PATH/FILENAME where you wish to copy the data: " FILENAME
  echo "Enter the JOB_NAME or %SEARCHSTRING%"


 while read -r i;
   do

  awk '
    BEGIN {
    print "\"insert_job\",\"job_type\",\"box_name\",\"command\",\"machine\",\"owner\",\"date_conditions\",\"condition\",\"run_calendar\",\"exclude_calendar\",\"days_of_week\",\"run_window\",\"start_times\",\"start_mins\",\"resources\",\"profile\",\"term_run_time\",\"watch_file\",\"watch_interval\"" }

/job_type/ {
    if (NR>1){printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n", jn, jt, box, cmd, mcn, own, dc, c, rc, ec, dow, ruw, st, sm, res, prof, trt, wf, wi} jn="\""$2"\""; jt="\""$4"\""; box="\" \""; cmd="\" \""; mcn="\" \""; own="\" \""; dc="\" \""; c="\" \""; rc="\" \""; ec="\" \""; dow="\" \""; ruw="\" \""; st="\" \""; sm="\" \""; res="\" \""; prof="\" \""; trt="\" \""; wf="\" \""; wi="\" \""}
    /box_name/ {box="\""$2"\""}
    /command/ {$0=substr($0,index($0,$2)); cmd="\""$0"\""}
    /machine/ {mcn="\""$2"\""}
    /owner/   {own="\""$2"\""}
    /date_conditions/ {dc="\""$2"\""}
    /condition/ {$0=substr($0,index($0,$2)); c="\""$0"\""}
    /run_calendar/ {rc="\""$2"\""}
    /exclude_calendar/ {ec="\""$2"\""}
    /days_of_week/ {dow="\""$2"\""}
    /run_window/ {ruw="\""$2"\""}
    /start_times/ {gsub("\"",""); st="\""$2"\""}
    /^start_mins/ {sm="\""$2"\""}
    /profile/ {prof="\""$2"\""}
    /term_run_time/ {trt="\""$2"\""}
    /watch_file/ {wf="\""$2"\""}
    /watch_interval/ {wi="\""$2"\""}
    /resources/ {res="\""$2"\""}
    END{printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n", jn, jt, box, cmd, mcn, own, dc, c, rc, ec, dow, ruw, st, sm, res, prof, trt, wf, wi}
' < <(autorep -j $i -q) > $FILENAME.csv

break
done

The $i takes a wildcard entry and gives output asper that.

for eg: the below 4 jobs have Test in their name, so i will give Test% as the wildcard value and the script will give the output for all the 4 jobs.

These are the test jobs i am using:

/* ----------------- Test_A ----------------- */

insert_job: Test_A  job_type: CMD
command: sleep 3000
machine: machine1
owner: user1
permission:
date_conditions: 1
days_of_week: mo,tu,we,th,FR
start_times: “06:00,08:00,10:00,12:00,14:00,16:00”
term_run_time: 1
alarm_if_fail: 1
alarm_if_terminated: 1


/* ----------------- Test_B ----------------- */

insert_job: Test_B    job_type: CMD
command: echo
machine: machine2
owner: user2
permission:
date_conditions: 0
description: "Test"
std_out_file: "/tmp/$AUTO_JOB_NAME.$AUTORUN.out"
std_err_file: "/tmp/$AUTO_JOB_NAME.$AUTORUN.err"
max_run_alarm: 1
alarm_if_fail: 0
alarm_if_terminated: 0
send_notification: 1



/* ----------------- Test_c ----------------- */

insert_job: Test_c   job_type: CMD
command: sleep 10
machine: machine3
owner: user3
permission:
date_conditions: 0
alarm_if_fail: 0
alarm_if_terminated: 0


/* ----------------- Test_d ----------------- */

insert_job: Test_d   job_type: CMD
command: ls
machine: machine4
owner: user4
permission:
date_conditions: 0
alarm_if_fail: 1
alarm_if_terminated: 1

But here is the issue, the csv file output has a blank row between the column names and the data like below:

"insert_job","job_type","box_name","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","resources","profile","term_run_time","watch_file","watch_interval"
,,,,,,,,,,,,,,,,,,
"Test_A","CMD"," ","sleep 3000","machine1","user1","0","0"," "," "," "," "," "," "," "," ","1"," "," "
"Test_B","CMD"," ","echo","machine2","user2","0","0"," "," "," "," "," "," "," "," "," "," "," "
"Test_c","CMD"," ","sleep 10","machine3","user3","0","0"," "," "," "," "," "," "," "," "," "," "," "
"Test_d","CMD"," ","ls","machine4","user4","0","0"," "," "," "," "," "," "," "," "," "," "," "

Required output is:

"insert_job","job_type","box_name","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","resources","profile","term_run_time","watch_file","watch_interval"
"Test_A","CMD"," ","sleep 3000","machine1","user1","0","0"," "," "," "," "," "," "," "," ","1"," "," "
"Test_B","CMD"," ","echo","machine2","user2","0","0"," "," "," "," "," "," "," "," "," "," "," "
"Test_c","CMD"," ","sleep 10","machine3","user3","0","0"," "," "," "," "," "," "," "," "," "," "," "
"Test_d","CMD"," ","ls","machine4","user4","0","0"," "," "," "," "," "," "," "," "," "," "," "

I have tried using (NR>=1) but it doesn't work. I know this is very trivial but I cant get my head around it, can someone help me?


Solution

  • Here's how to robustly (your existing script will fail for various possible input values) and maintainably (if you want to add/remove fields to print or reorder them just update the first split() arg) do what you're trying to do, using cat file inplace of autorep -j $i -q which I don't have on my system:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    cat file |
    awk '
        BEGIN {
            OFS = ","
            numTags = split("insert_job job_type box_name command machine owner date_conditions condition run_calendar exclude_calendar days_of_week run_window start_times start_mins resources profile term_run_time watch_file watch_interval",tags)
            for ( tagNr=1; tagNr<=numTags; tagNr++ ) {
                tag = tags[tagNr]
                printf "\"%s\"%s", tag, (tagNr<numTags ? OFS : ORS)
            }
        }
    
        !NF || /^\/\*/ { next }
        { gsub(/^[[:space:]]+|[[:space:]]+$/,"") }
    
        match($0,/[[:space:]]job_type:/) {
            if ( jobNr++ ) {
                prt()
                delete tag2val
            }
    
            # save "insert_job" value
            tag = substr($1,1,length($1)-1)
            val = substr($0,length($1)+1,RSTART-(length($1)+2))
            gsub(/^[[:space:]]+|[[:space:]]+$/,"",val)
            tag2val[tag] = val
    
            # update $0 to start with "job_type" to look like all other input
            $0 = substr($0,RSTART+1)
        }
    
        {
            tag = val = $0
            sub(/:.*/,"",tag)
            sub(/[^:]+:[[:space:]]*/,"",val)
            tag2val[tag] = val
        }
    
        END { prt() }
    
        function prt(    tagNr,tag,val) {
            for ( tagNr=1; tagNr<=numTags; tagNr++ ) {
                tag = tags[tagNr]
                val = tag2val[tag]
                printf "\"%s\"%s", val, (tagNr<numTags ? OFS : ORS)
            }
        }
    '
    

    $ ./tst.sh
    "insert_job","job_type","box_name","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","resources","profile","term_run_time","watch_file","watch_interval"
    "Test_A","CMD","","sleep 3000","machine1","user1","0","","","","","","","","","","1","",""
    "Test_B","CMD","","echo","machine2","user2","0","","","","","","","","","","","",""
    "Test_c","CMD","","sleep 10","machine3","user3","0","","","","","","","","","","","",""
    "Test_d","CMD","","ls","machine4","user4","0","","","","","","","","","","","",""
    

    The above is a simplified version of my answer to your previous question where I provided a script that'd output a CSV of all fields instead of the above which only outputs a specific list of fields.

    It's not clear why you have a loop calling autorep in your existing script but if you do need one for some reason then chances are you should be calling autorep in a loop and piping the loop output to awk

    while IFS= read -r i; do
        autorep -j "$i" -q
    done |
    awk '...'
    

    instead of calling autorep and piping it's output to awk inside the loop:

    while IFS= read -r i; do
        autorep -j "$i" -q |
        awk '...'
    done
    

    The above was run on this input copy/pasted from your question:

    $ cat file
    /* ----------------- Test_A ----------------- */
    
    insert_job: Test_A  job_type: CMD
    command: sleep 3000
    machine: machine1
    owner: user1
    permission:
    date_conditions: 0
    term_run_time: 1
    alarm_if_fail: 1
    alarm_if_terminated: 1
    
    
    /* ----------------- Test_B ----------------- */
    
    insert_job: Test_B    job_type: CMD
    command: echo
    machine: machine2
    owner: user2
    permission:
    date_conditions: 0
    description: "Test"
    std_out_file: "/tmp/$AUTO_JOB_NAME.$AUTORUN.out"
    std_err_file: "/tmp/$AUTO_JOB_NAME.$AUTORUN.err"
    max_run_alarm: 1
    alarm_if_fail: 0
    alarm_if_terminated: 0
    send_notification: 1
    
    
    
    /* ----------------- Test_c ----------------- */
    
    insert_job: Test_c   job_type: CMD
    command: sleep 10
    machine: machine3
    owner: user3
    permission:
    date_conditions: 0
    alarm_if_fail: 0
    alarm_if_terminated: 0
    
    
    /* ----------------- Test_d ----------------- */
    
    insert_job: Test_d   job_type: CMD
    command: ls
    machine: machine4
    owner: user4
    permission:
    date_conditions: 0
    alarm_if_fail: 1
    alarm_if_terminated: 1