Search code examples
shellposixhp-ux

How to extract lines containing unique text in a column


I have a text file similar to

"3"|"0001"
"1"|"0003"
"1"|"0001"
"2"|"0001"
"1"|"0002"

i.e. a pipe-delimited text file containing quoted strings.

What I need to do is:

First, extract the first line which contains each value in the first column, producing

"3"|"0001"
"1"|"0003"
"2"|"0001"

Then, sort by the values in the first column, producing

"1"|"0003"
"2"|"0001"
"3"|"0001"

Performing the sort is easy - sort -k 1,1 -t \| - but I'm stuck on extracting the first line in the file which contains each value in the first column. I thought of using uniq but it doesn't do what I want, and it's "column-handling" abilities are limited to ignoring the first 'x' columns of space-or-tab delimited text.

Using the Posix shell (/usr/bin/sh) under HP-UX.

I'm kind of drawing a blank here. Any suggestions welcomed.


Solution

  • you can do:

    awk -F'|' '!a[$1]++' file|sort...
    

    The awk part will remove the duplicated lines, only leave the first occurrence.

    I don't have a HP-unix box, I therefore cannot do real test. But I think it should go...