Search code examples
bashuniquecolumnsorting

How to finding unique elements in a column without sorting in bash?


I am trying to find unique occurrences of elements in one of the columns (2nd col to be specific) of a data file using bash. I don't want the output to be sorted or randomized. After searching a lot,I found a solution based on 'awk' which partially worked:

awk '{arr[$2] = 1} END {for (key in arr) {print key}}' input_file > output_file

but output seems to be random. I wish to perform this operation in such a way that, for each element, it's last occurrence is checked. Or in other words 'uniqueness' is checked starting from the end of the file. As an example if the elements are in the following order:

5, 6, 7, 5, 6, 8, 5, 6, 9, 6, 9, 10, 10, 11, 10, 11, 12

then the output should be:

7, 8, 5, 6, 9, 10, 11, 12


Solution

  • An approach by reading file twice:-

    awk 'NR==FNR{++A[$2];next}A[$2]==++T[$2]' input_file input_file