Search code examples
linuxsedawkgrepcut

How to cut range of characters from multiple columns


I have a tab delimited file which looks like this:

CHROM <TAB> POS <TAB> AD0062-C <TAB> AD0063-C <TAB> AD0065-C <TAB> AD0074-C 
2L <TAB> 440 <TAB>0/1:63:60,0,249 <TAB>0/1:89:86,0,166 <TAB>1/1:96:107,24,0<TAB>1/1:49:42,6,0  
2L <TAB> 260<TAB>0/1:66:63,0,207<TAB> 1/1:99:227,111,0<TAB>1/1:99:255,144,0<TAB> 1/1:49:42,6,0
2L <TAB> 595 <TAB> 0/1:11:85,0,8 <TAB>0/1:13:132,0,10 <TAB>0/1:73:70,0,131<TAB> 0/1:59:72,0,56

I want to select only the first 3 characters starting from column 3 so that I can get an output that looks like this:

CHROM <TAB> POS <TAB> AD0062-C <TAB> AD0063-C <TAB> AD0065-C <TAB> AD0074-C 
2L <TAB> 440 <TAB> 0/1 <TAB> 0/1 <TAB> 1/1 <TAB> 1/1  
2L <TAB> 260 <TAB> 0/1 <TAB> 1/1 <TAB> 1/1 <TAB> 1/1
2L <TAB> 595 <TAB> 0/1 <TAB> 0/1 <TAB> 0/1 <TAB> 0/1

Thanks


Solution

  • Using awk. For every line but first one and if it has more that two fields, get substring of them. The print command it is for every line because it has no condition.

    awk '
        BEGIN { OFS = "\t" }
        NF > 2 && FNR > 1 { 
            for ( i=3; i<=NF; i++ ) { 
                $i = substr( $i, 1, 3 ) 
            } 
        } 
        { print }
    ' infile
    

    Output:

    CHROM   POS     AD0062-C        AD0063-C        AD0065-C        AD0074-C 
    2L      440     0/1     0/1     1/1     1/1
    2L      260     0/1     1/1     1/1     1/1
    2L      595     0/1     0/1     0/1     0/1