Search code examples
perlawknawk

Position of a character each field


How can I represent the position of 1 (considering only the 1s after the colon) in the word from field5 and above; counting from right to left.

Input:

TT-124 06-03-14 08-02-10 FAS   CAT1:10
TT-125-1 05-03-14 10-06-08 CAS   CAT2:1010 FAT1:10000
TT-125-3 07-03-14 11-02-06 FAS   FAT1:1101
SS-120-1 05-03-14 09-04-07 FAS   CAT3:100000
AA-121-0 06-03-14 08-03-06 CAS   FAT2:11 CAT1:101100

Output:

TT-124 06-03-14 08-02-10 FAS   CAT1:1
TT-125-1 05-03-14 10-06-08 CAS   CAT2:3 CAT2:1 FAT1:4
TT-125-3 07-03-14 11-02-06 FAS   FAT1:0 FAT1:2 FAT1:3
SS-120-1 05-03-14 09-04-07 FAS   CAT3:5
AA-121-0 06-03-14 08-03-06 CAS   FAT2:0 FAT2:1 CAT1:2 CAT1:3 CAT1:5

I have tried below script (as suggested by someone) which works in cygwin, but does not work on Solaris 10 using nawk or /usr/xpg4/bin/awk How can I implement in solaris?

awk '{for(i = 5; i <= NF; i++) {split($i, a, ":"); $i = ""; split(a[2], b, "");
  for(j = 1; j <= length(b); j++) {if(b[j] == 1)
    {$i = ($i == "") ? (a[1] ":" length(b) - j) : ($i FS a[1] ":" length(b) - j)}}};
  print $0}' file

Solution

  • The drawback of using a one-liner that looks like that (quite illegible) written by someone else is that it is incredibly hard for you to fix when it breaks. I've written some Perl code that should be easier to understand and easier for you to maintain.

    It basically splits your string using a regex which is hardcoded to first ignore (match and print) 4 space separated fields of data, then extract the relevant data after it. This could be made a lot more specific, but you have failed to specify what format your data has, if any, so this is the most specific I would make it for now.

    Then it takes the first element of the @- array $-[0] to find the position of each 1 in the numeric string of each target string, and build new strings based on the name, and each numeric match.

    use strict;
    use warnings;
    
    while (<DATA>) {
        my ($pre, $data) = /^((?:\S+\s+){4})(.+)/;   # capture using regex
        print $pre;                                  # print original prefix string
        my @list;
        for (split ' ', $data) {                     # the list of CAT2:11 pairs
            my ($name, $num) = split /:/;            # separate name and number
            $num = reverse $num;                     # reverse order of numbers
            while ($num =~ /1/g) {                   # extract position of each "1" 
                push @list, "$name:$-[0]";           # make the new string with name
            }
        }
        print "@list\n";                             # print list separated by space
    }
    
    __DATA__
    TT-124 06-03-14 08-02-10 FAS   CAT1:10
    TT-125-1 05-03-14 10-06-08 CAS   CAT2:1010 FAT1:10000
    TT-125-3 07-03-14 11-02-06 FAS   FAT1:1101
    SS-120-1 05-03-14 09-04-07 FAS   CAT3:100000
    AA-121-0 06-03-14 08-03-06 CAS   FAT2:11 CAT1:101100
    

    Output:

    TT-124 06-03-14 08-02-10 FAS   CAT1:1
    TT-125-1 05-03-14 10-06-08 CAS   CAT2:1 CAT2:3 FAT1:4
    TT-125-3 07-03-14 11-02-06 FAS   FAT1:0 FAT1:2 FAT1:3
    SS-120-1 05-03-14 09-04-07 FAS   CAT3:5
    AA-121-0 06-03-14 08-03-06 CAS   FAT2:0 FAT2:1 CAT1:2 CAT1:3 CAT1:5
    

    Usage:

    To try the code out, change the file handle <DATA> to <> and use:

    perl script.pl input.txt > output.txt
    

    Redirecting to output file is optional, of course.