Apologies for the basic question, but I am pretty stuck. I have a basic tab-delimited text file called map.txt
. The first line of the file is column names, preceded with the #
character. Note, this is a tab-delimited file. Here is the output of head map.txt
:
#SampleID BarcodeSequence LinkerPrimerSequence sample_type Description geneticSampleID
OSBS.087.39.M.32.18.20140227 TCCCTTGTCTCC CGGCTGCGTTCTTCATCGATGC soil Plate 1A1 OSBS_087-M-32-18-20140227-gen
OSBS.048.41.M.37.33.20140227 ACGAGACTGATT CGGCTGCGTTCTTCATCGATGC soil Plate 1A2 OSBS_048-M-37-33-20140227-gen
OSBS.048.23.M.15.31.20140227 GCTGTACGGATT CGGCTGCGTTCTTCATCGATGC soil Plate 1A3 OSBS_048-M-15-31-20140227-gen
OSBS.047.21.M.20.3.20140227 ATCACCAGGTGT CGGCTGCGTTCTTCATCGATGC soil Plate 1A4 OSBS_047-M-20-3-20140227-gen
OSBS.119.23.M.18.38.20140227 TGGTCAACGATA CGGCTGCGTTCTTCATCGATGC soil Plate 1A5 OSBS_119-M-18-38-20140227-gen
OSBS.047.41.M.22.36.20140227 ATCGCACAGTAA CGGCTGCGTTCTTCATCGATGC soil Plate 1A6 OSBS_047-M-22-36-20140227-gen
OSBS.087.41.M.40.21.20140227 GTCGTGTAGCCT CGGCTGCGTTCTTCATCGATGC soil Plate 1A7 OSBS_087-M-40-21-20140227-gen
OSBS.048.21.M.5.11.20140227 AGCGGAGGTTAG CGGCTGCGTTCTTCATCGATGC soil Plate 1A8 OSBS_048-M-5-11-20140227-gen
OSBS.119.39.M.27.5.20140227 ATCCTTTGGTTC CGGCTGCGTTCTTCATCGATGC soil Plate 1A9 OSBS_119-M-27-5-20140227-gen
I would like to change the names geneticSampleID
to Name
and BarcodeSequence
to Index1
. I would then like to delete all columns except for the newly named Name
and Index1
columns. I'd like the first line of the file to remain the colum names Name
and Index1
, preceded by the #
character. Changing the names of the columns is straightforward enough in vim, deleting columns is not.
Desired output:
#Index1 Name
TCCCTTGTCTCC OSBS_087-M-32-18-20140227-gen
ACGAGACTGATT OSBS_048-M-37-33-20140227-gen
GCTGTACGGATT OSBS_048-M-15-31-20140227-gen
ATCACCAGGTGT OSBS_047-M-20-3-20140227-gen
TGGTCAACGATA OSBS_119-M-18-38-20140227-gen
ATCGCACAGTAA OSBS_047-M-22-36-20140227-gen
GTCGTGTAGCCT OSBS_087-M-40-21-20140227-gen
AGCGGAGGTTAG OSBS_048-M-5-11-20140227-gen
ATCCTTTGGTTC OSBS_119-M-27-5-20140227-gen
You can do this with shell commands like awk
and sed
, for example:
cat map.txt | awk '{ print $2, $7}' | sed 's/BarcodeSequence/#Index1 Name/'
Output:
#Index1 Name
TCCCTTGTCTCC OSBS_087-M-32-18-20140227-gen
ACGAGACTGATT OSBS_048-M-37-33-20140227-gen
GCTGTACGGATT OSBS_048-M-15-31-20140227-gen
ATCACCAGGTGT OSBS_047-M-20-3-20140227-gen
TGGTCAACGATA OSBS_119-M-18-38-20140227-gen
ATCGCACAGTAA OSBS_047-M-22-36-20140227-gen
GTCGTGTAGCCT OSBS_087-M-40-21-20140227-gen
AGCGGAGGTTAG OSBS_048-M-5-11-20140227-gen
ATCCTTTGGTTC OSBS_119-M-27-5-20140227-gen
Although now we are using sed
to replace 1 column name with 2 new names and that feels hacky to me. So in this case I would prefer the awk-only way:
awk 'BEGIN{ print "#Name", "Index1"} NR>1 { print $2, $7}' map.txt