Search code examples
shellawksed

How to extract data from a table using bash


Hi I wanted to extract the elements in the first column of the output from a cli which is like so:

             Title
----------------------------------
| Column A | Column B | Column C |
| -------- | -------- | -------- |
  > Cell 1   Cell2      Cell5    
  Cell 3     Cell4      Cell6    

I wish to extract all elements from the the first column. Peculiarities with this table:

  • Only one elements in the first column can have the '>' character followed by a space and name.
  • Names in the first column can contain spaces
  • Elements in column 2 or 3 dont have spaces

For example

             Title
----------------------------------
| Names    | ID       | Sales    |
| -------- | -------- | -------- |
  > Fin Bo   192019     10       
  QuarGem    12982      1        

I would want > Fin Bo and QuarGem

I have access to sed, awk, grep and other linux tools and have currently tried the following:

cat ./table | tail -n +5 | awk '{print $1}' but it fails to give the proper outcome and instead gives the following:

>
W-44XA45
Flint

for the input:

                                    Title
--------------------------------------------------------------------------------
      name                               id                  sales
--------------------------------------------------------------------------------
  >   quantum                            pki                 10
      W-44XA45                           aks                 1
      Flint GD                           ixs                 2

Solution

  • Using any POSIX awk and preserving white space inside the first field:

    $ awk 'NR>4{ gsub(/^[[:space:]]+|([[:space:]]+[^[:space:]]+){2}$/,""); print }' file
    >   quantum
    W-44XA45
    Flint GD
    

    or if you don't care about preserving white space and are using an awk where changing NF changes the number of fields, e.g. GNU awk:

    $ awk 'NR>4{ NF-=2; print }' file
    > quantum
    W-44XA45
    Flint GD