I have a data frame with >100 columns each labeled with a unique string. Column 1 represents the index variable. I would like to use a basic UNIX command to extract the index column (column 1) + a specific column string using grep
.
For example, if my data frame looks like the following:
Index A B C...D E F
p1 1 7 4 2 5 6
p2 2 2 1 2 . 3
p3 3 3 1 5 6 1
I would like to use some command to extract only column "X" which I will specify with grep
, and display both column 1 & the column I grep
'd. I know that I can use cut -f1 myfile
for the first bit, but need help with the grep
per column. As a more concrete example, if my grep
phrase were "B", I would like the output to be:
Index B
p1 7
p2 2
p3 3
I am new to UNIX, and have not found much in similar examples. Any help would be much appreciated!!
First figure out the command to find the column number.
columnname=C
sed -n "1 s/${columnname}.*//p" datafile | sed 's/[^\t*]//g' | wc -c
Once you know the number, use cut
cut -f1,3 < datafile
Combine into one command
cut -f1,$(sed -n "1 s/${columnname}.*//p" datafile |
sed 's/[^\t*]//g' | wc -c) < datafile
Finished? No, you should improve the first sed
command when one header can be a substring of another header: include tabs in your match and put the tabs back in the replacement string.