I want to generate an output file that shows the frequency of each word inside an input file. After some search, I found that Perl is the ideal language for this problem, but I don't know this language.
After some more search, I found the following code here at stackoverflow, supposedly it provides the solution I want at great efficiency:
perl -lane '$h{$_}++ for @F; END{for $w (sort {$h{$b}<=>$h{$a} || $a cmp $b} keys %h) {print "$h{$w}\t$w"}}' file > freq
I tried running this command line using the form below:
perl -lane 'code' input.txt > output.txt
The execution halts due to an unexpected '>' (the one at '<=>'). I did some research but can't understand what is wrong. Could some one enlight me? Thanks!
Here is the topic from where I got the code: Elegant ways to count the frequency of words in a file
If it's relevant, my words use letters and numbers and are separated by a single white space.
You are probably using Windows. You therefore need to use doubles quotes "
instead of singles quotes '
around your code:
perl -lane "$h{$_}++ for @F; END{for $w (sort {$h{$b}<=>$h{$a} || $a cmp $b} keys %h) {print qq($h{$w}\t$w)}}" file > freq
Also, note how I used qq()
instead of "..."
within the code, as suggested by @mob. Another option is to escape the quotes with \"
.