Search code examples
linuxsed

How do I remove duplicate lines using Sed without sorting?


I've been trying to figure out how to delete duplicate lines using only Sed and I'm having trouble figuring out how to do it.

So far I've tried this and it hasn't worked.

sed '$!N; /^\(.*\)\n\1$/!P; D'

file:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I got:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I want:

APPLE

ORANGES

BANANA

COOKIES

FRUITS

I've been trying to figure out how to do it so I won't have to manually go through each line in a file and tell it to manually delete the duplicates.

My goal is for this to eventually delete the second instance of BANANA.

Can anyone point me in the right direction?

Thanks


Solution

  • Using sed

    $ sed -n '/^$/d;G;/^\(.*\n\).*\n\1$/d;H;P;a\ ' input_file
    APPLE
    
    ORANGES
    
    BANANA
    
    COOKIES
    
    FRUITS
    

    Remove blank lines. Append hold space. If the line is duplicated, delete it, else copy into hold space, print and insert blank lines.