Search code examples
perlcommand-linesedgrepcut

Linux command line - Remove 2 characters before each tab for a certain line in a text file


I have a tab-delimited text file. The second line is a row of column headers. I want to reproduce the file but removing the last two characters from each column header, except the first one. The first character I want to remove is always a period and the second character is always a number, but can be different numbers (see example below). I need to accomplish this in Linux command line to streamline my analysis. Perhaps using some combination of sed, perl, grep, cut, or some other command?

For example,

I have:

Constructed data file 
Data    s123.4  s567.8  s901.2 
abcd    123456  789012  345678 
efgh    901234  567890  123456 
ijkl    789012  345678  901234

And I want:

Constructed data file
Data    s123    s567    s901
abcd    123456  789012  345678
efgh    901234  567890  123456
ijkl    789012  345678  901234   

I know this can be done in MS Excel by:
1. Enter a new row between Row 2 & 3
2. Copy column name from A2 to A3
3. In B3 enter =LEFT(B2, LEN(B2)-2)
4. Apply formula across whole row
5. Copy row & paste as values
6. Delete original Row 2

But of course it would be a lot faster in the Linux command line!


Solution

  • Using a perl one-liner

    perl -i -pe 's/\.\d\b//g if $. == 2' file.txt
    

    Explanation:

    Switches:

    • -i: Edit <> files in place (makes backup if extension supplied)
    • -p: Creates a while(<>){...; print} loop for each “line” in your input file.
    • -e: Tells perl to execute the code on command line.

    Code:

    • $. == 2: Checks if the current line is line number 2.
    • s/\.\d\b//g: Remove all .NUM at the end of words