Search code examples
linuxbashfileawkwords

How to display duplicates from a text file using awk


I'm trying to find out how to use the "awk" command, in order to display a word that shows up multiple times in a file(txt). In addition, how can you display the name of this/those file/s?

ex: first sentence first file. Second sentence followed by the second word.

This should display: "first" and "second"


Solution

  • I assume with -i you mean comparison / counting should be ignoring case.

    If I understand your requirements correctly an command like this should work:

    awk '{ for( i=1; i<=NF; i++){ cnt[ tolower( $i ) ]++; if (cnt[$i] > 1) {print $i} } }' yourfile | sort -u
    

    It prints these words for your example:

    • first
    • second
    • sentence
    • the

    If you need a case sensitive counting, just delete tolower .

    For each line in the file, the script iterates through each word (the for( i=1 i <= NF; i++) loop):

    • increments for each word a counter ( cnt[ tolower( $i) ]++ )
    • if the count is larger than 1 the word is printer
    • the pipe to sort -u sorts the output and removes the duplicates from the output.