Search code examples
linuxunixawkcommand-linegrep

How to create a list of all words in a bunch of files?


I'm dealing with a colleague who has made an enormous amount of copied/pasted spelling errors throughout an entire C# solution.

Instead of using a spelling checker on every individual file, I would like to create a list of all words in the entire solution, launch a spelling checker on that list, and do a complete "find-and-replace" for the found entries.

In order to find all words in a file, I had thought of doing something like:

grep -wo ".*" blabla.txt

But that seems not to be working: instead of showing every individual found word, it still shows the entire lines where the words are found, something like:

this is OK
this is NOK
OK it is
NOK it is
Everything is OK

While I was expecting something like:

this
is
OK
this
is
NOK
...

Once I have the list for one file, I can start working with find ./ -name "*.cs" -exec grep ... {} \; >>output_list and do some sort output_list | uniq in order to get the single words.

But first things first: as grep -ow ".*" does not show me the words, but the entire line, what can I do to show all words in a file using UNIX/Linux commandline? (I added awk as a tag, because this might be a solution? But I'm certainly no awk wizard :-) )

Edit after first answers:
tr indeed seems the way to go. I might simply use tr ' ' '\n', but there's a catch: I tried the following but it didn't work:

find ./ -name "*.cs" -exec cat {} | tr ' ' '\n' >>/mnt/c/Temp_Folder\output.txt \;

The command gives me a > answer (as I'm inside some code editor or so), what am I still doing wrong?


Solution

  • How about using tr to replace every space/tab to line break:

    tr '[[:blank:]]' '\n' <file
    
    this
    is
    OK
    this
    is
    NOK
    OK
    it
    is
    NOK
    it
    is
    Everything
    is
    OK
    

    Based on your edited question, you may use this find + tr solution in bash shell:

    while IFS= read -rd '' f; do
       tr ' ' '\n' < "$f"
    done < <(find . -name '*.cs' -print0) >/mnt/c/Temp_Folder/output.txt