Search code examples
arraysawktext-processing

Counting by analizing two column in difficult pattern in awk by using probably arrays


I have a huge problem. I try to create a script, which counts a specific sum (sum of water bridges never mind). This is a small part of my data file

POP62 SOL11
KAR1  SOL24
KAR5  SOL31
POP17 SOL42
POP15 SOL2
POP17 SOL2
KAR7  SOL42
KAR1  SOL11
KAR6  SOL31

In the first column, I have POP or KAR with numbers like KAR1, POP17, etc. In the second column, I have always SOL with a number, but I have max 2 the same SOL (for example, I can have maximum 2 SOL42 or SOL11 etc., KAR and POP I can have more than 2).

And now the thing that I want to do. If I find that the same SOL is connected with both KAR and POP (whatever number) I add 1. For example:

KAR6  SOL5
POP8  SOL5

I add one to sum

In my data

POP62 SOL11
KAR1  SOL24
KAR5  SOL31
POP17 SOL42
POP15 SOL2
POP17 SOL2
KAR7  SOL42
KAR1  SOL11
KAR6  SOL31

I should have sum = 2 ,because

POP17 SOL42
KAR7  SOL42

and

POP62 SOL11
KAR1  SOL11

Do you have any idea how to do that. I think about using NR=FNR and going through the file two times and check the repetitions in the $2 maybe by using an array, but what next?

#!/bin/bash 
awk 'NR==FNR         ?? 
       some condition {sum++}  
       END             {print sum}' test1.txt{,} >> water_bridges_x2.txt

Edit solution I also add 0 if it is empty, because I want print 0 instead of null

awk '
{
   s = $1
   sub(/[0-9]+$/, "", s)           # strip digits from end in var s
   if ($2 in map && map[$2] != s)  # if existing entry is not same 
      ++sum                        # increment sum
   map[$2] = s
}
END {print sum+0}' file

2

Solution

  • You may try this awk:

    awk '
    {
       s = $1
       sub(/[0-9]+$/, "", s)           # strip digits from end in var s
       if ($2 in map && map[$2] != s)  # if existing entry is not same 
          ++sum                        # increment sum
       map[$2] = s
    }
    END {print sum+0}' file
    
    2