I have a huge problem. I try to create a script, which counts a specific sum (sum of water bridges never mind). This is a small part of my data file
POP62 SOL11
KAR1 SOL24
KAR5 SOL31
POP17 SOL42
POP15 SOL2
POP17 SOL2
KAR7 SOL42
KAR1 SOL11
KAR6 SOL31
In the first column, I have POP or KAR with numbers like KAR1, POP17, etc. In the second column, I have always SOL with a number, but I have max 2 the same SOL (for example, I can have maximum 2 SOL42 or SOL11 etc., KAR and POP I can have more than 2).
And now the thing that I want to do. If I find that the same SOL is connected with both KAR and POP (whatever number) I add 1. For example:
KAR6 SOL5
POP8 SOL5
I add one to sum
In my data
POP62 SOL11
KAR1 SOL24
KAR5 SOL31
POP17 SOL42
POP15 SOL2
POP17 SOL2
KAR7 SOL42
KAR1 SOL11
KAR6 SOL31
I should have sum = 2 ,because
POP17 SOL42
KAR7 SOL42
and
POP62 SOL11
KAR1 SOL11
Do you have any idea how to do that. I think about using NR=FNR and going through the file two times and check the repetitions in the $2 maybe by using an array, but what next?
#!/bin/bash
awk 'NR==FNR ??
some condition {sum++}
END {print sum}' test1.txt{,} >> water_bridges_x2.txt
Edit solution I also add 0 if it is empty, because I want print 0 instead of null
awk '
{
s = $1
sub(/[0-9]+$/, "", s) # strip digits from end in var s
if ($2 in map && map[$2] != s) # if existing entry is not same
++sum # increment sum
map[$2] = s
}
END {print sum+0}' file
2
You may try this awk
:
awk '
{
s = $1
sub(/[0-9]+$/, "", s) # strip digits from end in var s
if ($2 in map && map[$2] != s) # if existing entry is not same
++sum # increment sum
map[$2] = s
}
END {print sum+0}' file
2