Search code examples
unixgrep

Extract column using grep


I have a data frame with >100 columns each labeled with a unique string. Column 1 represents the index variable. I would like to use a basic UNIX command to extract the index column (column 1) + a specific column string using grep.

For example, if my data frame looks like the following:

Index  A  B  C...D  E  F
p1     1  7  4   2  5  6
p2     2  2  1   2  .  3
p3     3  3  1   5  6  1

I would like to use some command to extract only column "X" which I will specify with grep, and display both column 1 & the column I grep'd. I know that I can use cut -f1 myfile for the first bit, but need help with the grep per column. As a more concrete example, if my grep phrase were "B", I would like the output to be:

Index  B
p1     7
p2     2
p3     3

I am new to UNIX, and have not found much in similar examples. Any help would be much appreciated!!


Solution

  • First figure out the command to find the column number.

    columnname=C
    sed -n "1 s/${columnname}.*//p" datafile | sed 's/[^\t*]//g' | wc -c
    

    Once you know the number, use cut

    cut -f1,3 < datafile 
    

    Combine into one command

    cut -f1,$(sed -n "1 s/${columnname}.*//p" datafile | 
       sed 's/[^\t*]//g' | wc -c) < datafile
    

    Finished? No, you should improve the first sed command when one header can be a substring of another header: include tabs in your match and put the tabs back in the replacement string.