I have 10 fields of data which contain redundant and non-redundant data. I want to grep/sed/awk/uniq/whatever to make a non-redundant list.
Specifically i want to eliminant entries which have identical entries in fields 4, 6, 7 and 8. However i need to reserve one (the first) of these entries.
Heres and example input
1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3
1, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6
1, 3972372, 4u5p_1, blb, B, 47, 50, PKET, 1.78, 3
1, 3972376, 4u5p_1, al3, B, 91, 94, APFI, 1.78, 6
1, 3972387, 4u5p_1, al3, C, 91, 94, APFI, 1.78, 6
2, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 4
2, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6
2, 3972372, 4u5p_1, blb, B, 47, 50, PKET, 1.78, 4
2, 3972376, 4u5p_1, al3, B, 91, 94, APFI, 1.78, 6
2, 3972387, 4u5p_1, al3, C, 91, 94, APFI, 1.78, 6
here's an example output
1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3
1, 3972365,4u5p_1, al3, A, 91, 94, APFI, 1.78, 6
This is just an example, there will be incidences where just one of these entries will be different which must be kept in the final output.
Thanks very much!
you can also awk and seen as below;
awk '!seen[$4$6$7$8]++' yourFile
or
awk -F , '!seen[$4$6$7$8]++' file1
Eg;
user@host $ awk '!seen[$4$6$7$8]++' file1
1, 3972361, 4u5p_1, blb, A, 47, 50, PKET, 1.78, 3
1, 3972365, 4u5p_1, al3, A, 91, 94, APFI, 1.78, 6