bash to find non-redundant data across multiple fields

I have 10 fields of data which contain redundant and non-redundant data. I want to grep/sed/awk/uniq/whatever to make a non-redundant list.

Specifically i want to eliminant entries which have identical entries in fields 4, 6, 7 and 8. However i need to reserve one (the first) of these entries.

Heres and example input

1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3

1, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6

1, 3972372, 4u5p_1, blb, B, 47, 50, PKET, 1.78, 3

1, 3972376, 4u5p_1, al3, B, 91, 94, APFI, 1.78, 6

1, 3972387, 4u5p_1, al3, C, 91, 94, APFI, 1.78, 6

2, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 4

2, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6

2, 3972372, 4u5p_1, blb, B, 47, 50, PKET, 1.78, 4

2, 3972376, 4u5p_1, al3, B, 91, 94, APFI, 1.78, 6

2, 3972387, 4u5p_1, al3, C, 91, 94, APFI, 1.78, 6

here's an example output

1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3

1, 3972365,4u5p_1, al3, A, 91, 94, APFI, 1.78, 6

This is just an example, there will be incidences where just one of these entries will be different which must be kept in the final output.

Thanks very much!

Solution

you can also awk and seen as below;

awk '!seen[$4$6$7$8]++' yourFile

awk -F , '!seen[$4$6$7$8]++' file1

Eg;

user@host $ awk '!seen[$4$6$7$8]++' file1
1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3
1, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6