I have a tab-delimited text file. The second line is a row of column headers. I want to reproduce the file but removing the last two characters from each column header, except the first one. The first character I want to remove is always a period and the second character is always a number, but can be different numbers (see example below). I need to accomplish this in Linux command line to streamline my analysis. Perhaps using some combination of sed, perl, grep, cut, or some other command?
For example,
I have:
Constructed data file
Data s123.4 s567.8 s901.2
abcd 123456 789012 345678
efgh 901234 567890 123456
ijkl 789012 345678 901234
And I want:
Constructed data file
Data s123 s567 s901
abcd 123456 789012 345678
efgh 901234 567890 123456
ijkl 789012 345678 901234
I know this can be done in MS Excel by:
1. Enter a new row between Row 2 & 3
2. Copy column name from A2 to A3
3. In B3 enter =LEFT(B2, LEN(B2)-2)
4. Apply formula across whole row
5. Copy row & paste as values
6. Delete original Row 2
But of course it would be a lot faster in the Linux command line!
Using a perl one-liner
perl -i -pe 's/\.\d\b//g if $. == 2' file.txt
Switches:
-i
: Edit <>
files in place (makes backup if extension supplied)-p
: Creates a while(<>){...; print}
loop for each “line” in your input file. -e
: Tells perl
to execute the code on command line. Code:
$. == 2
: Checks if the current line is line number 2. s/\.\d\b//g
: Remove all .NUM
at the end of words