Search code examples
text-filesunique

Unique words in a text file


I was wondering if there is a way to find (and display) all the unique words (words that appear once) in a text file? Could this be done just using the command line? Or would I have to use something like a python script?


Solution

  • If you don't want to write an application then the easiest way that I can think to accomplish this is to use powershell. See this:

    https://msdn.microsoft.com/en-us/powershell/reference/5.1/microsoft.powershell.utility/get-unique

    The example that Microsoft provides populates a variable with the list of unique words:

    $A = $(foreach ($line in Get-Content C:\Test1\File1.txt) {$line.tolower().split(" ")}) | sort | Get-Unique
    

    You may wish you use additional delimiters though to split on punctuation such as this:

    $A = $(foreach ($line in Get-Content C:\test.txt) {$line.tolower().split(" .,?!;:")}) | sort | Get-Unique
    

    Place this in a file with the extension .ps1 and you can run it from the command line. In order to get the values out of the variable just a second line with the variable to echo the result to the screen:

    $A
    

    To get the count of items in the array you could do this:

    $A.count