Search code examples
awkduplicatesrename

How to rename duplicate lines with awk?


I have a file with 1 million lines and some lines are duplicate. I would like to rename the duplicate lines by appending "variant" + a number. The file is formatted as follows:

I am a test line
She is beautiful
need for speed
Nice day today
I am a test line
stack overflow is fun
I am a test line
stack overflow is fun
I have more sentences
I am a test line
She is beautiful
Speed for need
stack overflow is fun
Let's stop here

Desired results:

    I am a test line
    She is beautiful
    need for speed
    Nice day today
    I am a test line variant 1
    stack overflow is fun
    I am a test line variant 2
    stack overflow is fun variant 1
    I have more sentences
    I am a test line variant 3
    She is beautiful variant 1
    Speed for need variant 1
    stack overflow is fun variant 2
    Let's stop here

Solution

  • $ awk 'cnt[$0]++{$0=$0 " variant " (cnt[$0]-1)} 1' file
    I am a test line
    She is beautiful
    need for speed
    Nice day today
    I am a test line variant 1
    stack overflow is fun
    I am a test line variant 2
    stack overflow is fun variant 1
    I have more sentences
    I am a test line variant 3
    She is beautiful variant 1
    Speed for need
    stack overflow is fun variant 2
    Let's stop here