Search code examples
ruby-on-railsrubyrspecrspec-rails

How to calculate the highest word frequency in Ruby


I have been working on this assignment for a Coursera Intro to Rails course. We have been tasked to write a program that calculates maximum word frequency in a text file. We have been instructed to create a method which:

  1. Calculates the maximum number of times a single word appears in the given content and store in highest_wf_count.
  2. Identify the words that were used the maximum number of times and store that in highest_wf_words.

When I run the rspec tests that were given to us, one test is failing. I printed my output to see what the problem is but haven't been able to fix it.

Here is my code, the rspec test, and what I get:

class LineAnalyzer

  attr_accessor :highest_wf_count
  attr_accessor :highest_wf_words
  attr_accessor :content
  attr_accessor :line_number

  def initialize(content, line_number)
    @content = content
    @line_number = line_number
    @highest_wf_count = 0
    @highest_wf_words = highest_wf_words
    calculate_word_frequency
  end
  def calculate_word_frequency()
    @highest_wf_words = Hash.new(0)
    @content.split.each do |word|
      @highest_wf_words[word.downcase!] += 1
      if @highest_wf_words.has_key?(word)
        @highest_wf_words[word] += 1 
      else
        @highest_wf_words[word] = 1
      end
      @highest_wf_words.sort_by{|word, count| count}
      @highest_wf_count = @highest_wf_words.max_by {|word, count| count}
    end
  end
  def highest_wf_count()
    p @highest_wf_count
  end
end

This is the rspec code:

require 'rspec'

describe LineAnalyzer do
  subject(:lineAnalyzer) { LineAnalyzer.new("test", 1) }

  it "has accessor for highest_wf_count" do
    is_expected.to respond_to(:highest_wf_count) 
  end 
  it "has accessor for highest_wf_words" do
    is_expected.to respond_to(:highest_wf_words) 
  end
  it "has accessor for content" do
    is_expected.to respond_to(:content) 
  end
  it "has accessor for line_number" do
    is_expected.to respond_to(:line_number) 
  end
  it "has method calculate_word_frequency" do
    is_expected.to respond_to(:calculate_word_frequency) 
  end
  context "attributes and values" do
  it "has attributes content and line_number" do
    is_expected.to have_attributes(content: "test", line_number: 1) 
  end
  it "content attribute should have value \"test\"" do
    expect(lineAnalyzer.content).to eq("test")
  end
  it "line_number attribute should have value 1" do
    expect(lineAnalyzer.line_number).to eq(1)
  end
end

  it "calls calculate_word_frequency when created" do
    expect_any_instance_of(LineAnalyzer).to receive(:calculate_word_frequency)
    LineAnalyzer.new("", 1) 
  end

  context "#calculate_word_frequency" do
    subject(:lineAnalyzer) { LineAnalyzer.new("This is a really really really cool cool you you you", 2) }

    it "highest_wf_count value is 3" do
      expect(lineAnalyzer.highest_wf_count).to eq(3)
    end
    it "highest_wf_words will include \"really\" and \"you\"" do
      expect(lineAnalyzer.highest_wf_words).to include 'really', 'you'
    end
    it "content attribute will have value \"This is a really really really cool cool you you you\"" do
      expect(lineAnalyzer.content).to eq("This is a really really really cool cool you you you")
    end
    it "line_number attribute will have value 2" do
      expect(lineAnalyzer.line_number).to eq(2)
    end
  end
end

This is the rspec output:

13 examples, 1 failure

Failed examples:

rspec ./course01/module02/assignment-Calc-Max-Word-Freq/spec/line_analyzer_spec.rb:42 # LineAnalyzer#calculate_word_frequency highest_wf_count value is 3

My output:

#<LineAnalyzer:0x00007fc7f9018858 @content="This is a really really really cool cool you you you", @line_number=2, @highest_wf_count=[nil, 10], @highest_wf_words={"this"=>2, nil=>10, "is"=>1, "a"=>1, "really"=>3, "cool"=>2, "you"=>3}>
  1. Based on the test string, the word counts aren't correct.
  2. "nil" is being included in the hash.
  3. The hash is not being sorted by the value (count) like it should.

I tried several things to fix these problems and nothing has worked. I went through the lecture material again, but can't find anything that would help and the discussion boards are not often monitored for questions from students.


Solution

  • Accoriding to Ruby documentation:

    downcase!(*args) public

    Downcases the contents of str, returning nil if no changes were made.

    Due to this unexpected behavior of .downcase! method, if the word is already all lowercase, you're incrementing occurrences of nil in this line:

    @highest_wf_words[word.downcase!] += 1
    

    Tests are also failing because @highest_wf_words.max_by {|word, count| count} returns an array containing the count and a word, while we want to get only the count.

    A simplified calculate_word_frequency method passing the tests would look like this:

      def calculate_word_frequency()
        @highest_wf_words = Hash.new(0)
    
        @content.split.each do |word|
          # we don't have to check if the word existed before
          # because we set 0 as default value in @highest_wf_words hash
    
          # use .downcase instead of .downcase!
          @highest_wf_words[word.downcase] += 1
    
          # extract only the count, and then get the max
          @highest_wf_count = @highest_wf_words.map {|word, count| count}.max
        end
      end