Search code examples
rubyfrequency

Counting frequency of symbols


So I have the following code which counts the frequency of each letter in a string (or in this specific instance from a file):

def letter_frequency(file)
  letters = 'a' .. 'z'
  File.read(file) .
  split(//) .
  group_by {|letter| letter.downcase} .
  select   {|key, val| letters.include? key} .
  collect  {|key, val| [key, val.length]}
end

letter_frequency(ARGV[0]).sort_by {|key, val| -val}.each {|pair| p pair}

Which works great, but I would like to see if there is someway to do something in ruby that is similar to this but to catch all the different possible symbols? ie spaces, commas, periods, and everything in between. I guess to put it more simply, is there something similar to 'a' .. 'z' that holds all the symbols? Hope that makes sense.


Solution

  • You won't need a range when you're trying to count every possible character, because every possible character is a domain. You should only create a range when you specifically need to use a subset of said domain.

    This is probably a faster implementation that counts all characters in the file:

    def char_frequency(file_name)
      ret_val = Hash.new(0)
      File.open(file_name) {|file| file.each_char {|char| ret_val[char] += 1 } }
      ret_val
    end
    
    p char_frequency("1003v-mm")  #=>  {"\r"=>56, "\n"=>56, " "=>2516, "\xC9"=>2, ...
    

    For reference I used this test file.