Search code examples
ruby-on-railstimeword-cloud

Word Frequency count in a very inefficient way


This is my code for calculate word frequency

  word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay",  "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]

arr_stop_kwd=["a","and"] 

 frequencies = Hash.new(0)
   word_arr.each { |word|
      if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
        frequencies["#{word.downcase}"] += 1
      end
   }

when i have 100k data it will take 9.03 seconds,that,s to much time can i calculate any another way

Thx in advance


Solution

  • Take a look at Facets gem

    You can do something like this using the frequency method

    require 'facets'
    frequencies = (word_arr-arr_stop_kwd).frequency
    

    Note that stop word can be subtracted from the word_arr. Refer to Array Documentation.