Search code examples
bashawkuniquetext-processingredundancy

Remove redundancy lines for "almost similar" strings


I have the below file:

ab=5
ac=6
ad=5
ba=5
bc=7
bd=4
ca=5
cb=7
cd=3
...

"ab" and "ba", "ac" and "ca", "bc" and "cb" are redundant. How do I eliminate these redundant lines in bash ?

Expected output:

ab=5
ac=6
ad=5
bc=7
bd=4
cd=3

Solution

  • $ awk '{x=substr($0,1,1); y=substr($0,2,1)} !seen[x>y?x y:y x]++' file
    ab=5
    ac=6
    ad=5
    bc=7
    bd=4
    cd=3