Search code examples
bashawksedredundancyuniq

bash to find non-redundant data across multiple fields


I have 10 fields of data which contain redundant and non-redundant data. I want to grep/sed/awk/uniq/whatever to make a non-redundant list.

Specifically i want to eliminant entries which have identical entries in fields 4, 6, 7 and 8. However i need to reserve one (the first) of these entries.

Heres and example input

1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3

1, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6

1, 3972372, 4u5p_1, blb, B, 47, 50, PKET, 1.78, 3

1, 3972376, 4u5p_1, al3, B, 91, 94, APFI, 1.78, 6

1, 3972387, 4u5p_1, al3, C, 91, 94, APFI, 1.78, 6

2, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 4

2, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6

2, 3972372, 4u5p_1, blb, B, 47, 50, PKET, 1.78, 4

2, 3972376, 4u5p_1, al3, B, 91, 94, APFI, 1.78, 6

2, 3972387, 4u5p_1, al3, C, 91, 94, APFI, 1.78, 6

here's an example output

1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3

1, 3972365,4u5p_1, al3, A, 91, 94, APFI, 1.78, 6

This is just an example, there will be incidences where just one of these entries will be different which must be kept in the final output.

Thanks very much!


Solution

  • you can also awk and seen as below;

    awk '!seen[$4$6$7$8]++' yourFile
    

    or

    awk -F , '!seen[$4$6$7$8]++' file1
    

    Eg;

    user@host $ awk '!seen[$4$6$7$8]++' file1
    1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3
    1, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6