Search code examples
bashunixsedgrepcut

How to use unix grep and the output together?


I am new to unix commands. I have a file named server.txt, which has 100 fields, the first line of the file is header.

I want to take a look at the fields at 99 and 100 only.

Field 99 is just some numbers, field 100 is a String.

The delimiter of each field which is a space.

My goal is to extract every tokens in the string(field100) by grep and regex, and then output with the field99 with every token extracted from the String, and skip the first 1000 lines of my records

----server.txt--
... ...   ,field99,field100
... ...    5,"hi are"
... ...    3,"how is"

-----output.txt
header1,header2
5,hi
5,are
3,how
3,is

so i just have some idea, but i dont know how to combine all the scripts

Here is some of my thought:

sed 1000d server.txt cut -f99,100  -d' ' >output.txt
grep | /[A-Za-z]+/| 

Solution

  • Sounds more like a job for awk.

    awk -F, 'NR <= 1000 { next; }
      { gsub(/^\"|\"$/, "", $100); split($100, a, / /);
        for (v=1; v<=length(a); ++v) print $99, a[v]; }' server.txt >output.txt
    

    The general form of an awk program is a sequence of condition { action } expressions. The first line has the condition NR <= 1000 where NR is the current line number. If the condition is true, the next action skips to the next input line. Otherwise, we fall through to the next expression, which does not have a condition; so, it's uncoditional, for all input lines which reach here. It first cleans out the double quotes around the 100th field value, and then splits it on spaces into the array a. The for loop then loops over this array, printing the 99th field value and the vth element of the array, starting with v=1 and up through the end of the array.

    The input file format is sort of cumbersome. The gsub and split stuff could be avoided with a slightly more sane input format. If you are new to awk, you should probably go look for a tutorial.

    If you only want to learn one scripting language, I would suggest Perl or Python over awk, but it depends on your plans and orientation.