I'm dealing with a colleague who has made an enormous amount of copied/pasted spelling errors throughout an entire C# solution.
Instead of using a spelling checker on every individual file, I would like to create a list of all words in the entire solution, launch a spelling checker on that list, and do a complete "find-and-replace" for the found entries.
In order to find all words in a file, I had thought of doing something like:
grep -wo ".*" blabla.txt
But that seems not to be working: instead of showing every individual found word, it still shows the entire lines where the words are found, something like:
this is OK
this is NOK
OK it is
NOK it is
Everything is OK
While I was expecting something like:
this
is
OK
this
is
NOK
...
Once I have the list for one file, I can start working with find ./ -name "*.cs" -exec grep ... {} \; >>output_list
and do some sort output_list | uniq
in order to get the single words.
But first things first: as grep -ow ".*"
does not show me the words, but the entire line, what can I do to show all words in a file using UNIX/Linux commandline? (I added awk
as a tag, because this might be a solution? But I'm certainly no awk
wizard :-) )
Edit after first answers:
tr
indeed seems the way to go. I might simply use tr ' ' '\n'
, but there's a catch: I tried the following but it didn't work:
find ./ -name "*.cs" -exec cat {} | tr ' ' '\n' >>/mnt/c/Temp_Folder\output.txt \;
The command gives me a >
answer (as I'm inside some code editor or so), what am I still doing wrong?
How about using tr
to replace every space/tab to line break:
tr '[[:blank:]]' '\n' <file
this
is
OK
this
is
NOK
OK
it
is
NOK
it
is
Everything
is
OK
Based on your edited question, you may use this find + tr
solution in bash
shell:
while IFS= read -rd '' f; do
tr ' ' '\n' < "$f"
done < <(find . -name '*.cs' -print0) >/mnt/c/Temp_Folder/output.txt