I want to characterize lines with equal fields but different in a field value in the same field position, as the same and select only one line.
Example input
Let field delimiter be "/"
1. abc/def/gh/ij/kl
2. abc/def/gh/ij/yi
3. abc/def/gh/ij/ti
4 abc/def/gh/hk/kl/oi/uh
5. abc/def/gh/ol/kl/oi/uh
6. abc/def/gh/er/kl/oi/uh
7. abc/def/gh/er/kl
Treat lines 1,2,3 as the same and select only 1 line, even though the values of their 5th field are different, they have same value of other fields and have equal field.
Treat lines 4,5,6 as the same and select only 1 line out of them, even though the values of their 4th field are different, they have same value of other fields and have equal field.
Line 6 and 7 are not the same, since they don't have equal number of fields.
Desired Output
abc/def/gh/ij/kl
abc/def/gh/hk/kl/oi/uh
NOTE : The list have lines of different field number.
I tried sort -u
but it obviously didn't work, since sort does not use delimiter. Can awk achieve this?
$ awk -F'/' '!seen[NF]++' file
abc/def/gh/ij/kl
abc/def/gh/hk/kl/oi/uh
If that's not all you need then edit your question to clarify your requirements and update your example to include lines for which this doesn't work.