Search code examples
rubystringblock

How do I count unique multiple words in a Ruby string?


Trying to write a Ruby code that will count unique words and return their total occurrences.

So suppose I want to find number of occurrences for Sally, Marina and Tina in the following sentence "Monday Tina will meet Sally and Harris. Then Tina will visit her mom Marina. Marina and Tina will meet David for dinner."

I tried the following but this defeats the dry principal. Is there a better way?

string = "Monday Tina will meet Sally and Harris. Then Tina will visit her mom Marina. Marina and Tina will meet David for dinner. Sally will then take Tina out for a late night party." 

puts "Marina appears #{string.split.count("brown").to_i} times."
puts "Tina appears #{string.split.count("grey").to_i} times."
puts "Sally appears #{string.split.count("blue").to_i} times."

Expected result: program looks through the text for unique words and returns them.

Actual: I had to hard code each unique word on its own PUTS line and do string.split.count(for that unique word)

Note: I tried the following but this gives me EVERY word. I need to refine it to give me just the ones I ask for. This is where I am struggling.

def cw(string)
  w = string.split(' ')
  freq = Hash.new(0)
  w.each { |w| freq[w.downcase] += 1 }
  return freq
end
puts cw(string)

Solution

  • def count_em(str, who)
      str.gsub(/\b(?:#{who.join('|')})\b/i).
          each_with_object(Hash.new(0)) { |person,h| h[person] += 1 }
    end
    
    str = "Monday Tina will meet Sally and Harris. Then Tina will visit her " +
          "mom Marina. Marina and Tina will meet David for dinner. Sally will " +
          "then take Tina out for a late night party." 
    
    who = %w| Sally Marina Tina |
    
    count_em(str, who)
      #> {"Tina"=>4, "Sally"=>2, "Marina"=>2}
    

    The first steps are as follows.

    r = /\b(?:#{who.join('|')})\b/i
      #=> /\b(?:Sally|Marina|Tina)\b/i
    enum = str.gsub(r)
      #=> #<Enumerator: "Monday Tina will meet Sally and Harris. Then
      #   ...
      #   for a late night party.":gsub(/\b(?:Sally|Marina|Tina)\b/i)>
    

    We can convert this to an array to see the values that will be passed to each_with_object.

    enum.to_a
      #=> ["Tina", "Sally", "Tina", "Marina", "Marina", "Tina", "Sally", "Tina"]
    

    We then simply count the number of instances of the unique values generated by enum.

    enum.each_with_object(Hash.new(0)) { |person,h| h[person] += 1 }
      #=> {"Tina"=>4, "Sally"=>2, "Marina"=>2}
    

    See String#gsub, in particular the case when there is one argument and no block. This is admittedly an unusual use of gsub, as it is making no substitutions, but here I prefer it to String#scan because gsub returns an enumerator whereas scan produces a temporary array.

    See also Hash::new, the case where new takes an argument and no block. The argument is called the default value. If h is the hash so-defined, the default value is returned by h[k] if h does not have a key k. The hash is not altered.

    Here the default value is zero. When the expression h[person] += 1 it is parsed it is converted to:

    h[person] = h[person] + 1
    

    If person equals "Tina", and it is the first time "Tina" is generated by the enumerator and passed to the block, h will not have a key "Tina", so the expression becomes:

    h["Tina"] = 0 + 1
    

    as 0 is the default value. The next time "Tina" is passed to the block the hash has a key "Tina" (with value 1), so the following calculation is performed.

    h["Tina"] = h["Tina"] + 1 #=> 1 + 1 #=> 2