Search code examples
unixcommand-lineawktokenize

How split a file in words in unix command line?


I'm doing a faster tests for a naive boolean information retrival system, and I would like use awk, grep, egrep, sed or thing similiar and pipes for split a text file into words and save them into other file with a word per line. Example my file cotains:

Hola mundo, hablo español y no sé si escribí bien la
pregunta, ojalá me puedan entender y ayudar
Adiós.

The output file should contain:

Hola
mundo
hablo
español
...

Thank!


Solution

  • Using tr:

    tr -s '[[:punct:][:space:]]' '\n' < file