Search code examples
bashsortingawkgnu

How can I sort lines that have same number of fields but different in value of one field uniquely using sort or awk?


I want to characterize lines with equal fields but different in a field value in the same field position, as the same and select only one line.

Example input

Let field delimiter be "/"

1. abc/def/gh/ij/kl
2. abc/def/gh/ij/yi
3. abc/def/gh/ij/ti
4  abc/def/gh/hk/kl/oi/uh
5. abc/def/gh/ol/kl/oi/uh
6. abc/def/gh/er/kl/oi/uh
7. abc/def/gh/er/kl

Treat lines 1,2,3 as the same and select only 1 line, even though the values of their 5th field are different, they have same value of other fields and have equal field.

Treat lines 4,5,6 as the same and select only 1 line out of them, even though the values of their 4th field are different, they have same value of other fields and have equal field.

Line 6 and 7 are not the same, since they don't have equal number of fields.

Desired Output

abc/def/gh/ij/kl
abc/def/gh/hk/kl/oi/uh

NOTE : The list have lines of different field number.

I tried sort -u but it obviously didn't work, since sort does not use delimiter. Can awk achieve this?


Solution

  • $ awk -F'/' '!seen[NF]++' file
    abc/def/gh/ij/kl
    abc/def/gh/hk/kl/oi/uh
    

    If that's not all you need then edit your question to clarify your requirements and update your example to include lines for which this doesn't work.