Search code examples
ruby-on-railsrubynokogiriwhenever

Scraped Results Not Updating


I am scheduling via the Whenever gem, however, it seems that my scraped results are not getting updated at all.

I think it's because it's being saved (i.e., the earlier results), so it's displaying those results only but I'm not sure.

Controller:

class EntriesController < ApplicationController

 
def index
  @entries = Entry.all
end

def scrape

    RedditScrapper.scrape

    respond_to do |format|
      format.html { redirect_to entries_url, notice: 'Entries were successfully scraped.' }
      format.json { entriesArray.to_json }
    end
  end

end

lib/reddit_scrapper.rb:

require 'open-uri'

module RedditScrapper
  def self.scrape
    doc = Nokogiri::HTML(open("https://www.reddit.com/"))

    entries = doc.css('.entry')
    entriesArray = []
    entries.each do |entry|
      title = entry.css('p.title > a').text
      link = entry.css('p.title > a')[0]['href']
      entriesArray << Entry.new({ title: title, link: link })
    end

    if entriesArray.map(&:valid?)
      entriesArray.map(&:save!)
    end
  end
end

config/schedule.rb:

RAILS_ROOT = File.expand_path(File.dirname(__FILE__) + '/')

every 2.minutes do 
  runner "RedditScrapper.scrape", :environment => "development"
end

model:

class Entry < ApplicationRecord

end

routes:

Rails.application.routes.draw do
    #root 'entry#scrape_reddit'
    root 'entries#index'
    resources :entries
    #get '/new_entries', to: 'entries#scrape', as: 'scrape'
end

View index.html.erb:

<h1>Reddit's Front Page</h1>
<% @entries.order("created_at DESC").limit(10).each do |entry| %>
  <h3><%= entry.title %></h3>
  <p><%= entry.link %></p>
<% end %>

Solution

  • Use just Entry.create! to create an entry:

    module RedditScraper
      def self.scrape
        doc = Nokogiri::HTML(open("https://www.reddit.com/"))
    
        entries = doc.css('.entry')
        entriesArray = []
        entries.each do |entry|
          title = entry.css('p.title > a').text
          link = entry.css('p.title > a')[0]['href']
          Entry.create!(title: title, link: link )
        end
      end
    end
    

    To get 10 latest entries:

    # controller
    def index
      @entries = Entry.order("created_at DESC").limit(10)
    end
    

    view:

    <% @entries.each do |entry| %>

    But too think you need change the order of parsing items from Reddit for the latest in the top but you add it first to database. You need to make a change in the Reddit scraper .

    Revert entries: instead of

    entries.each do |entry|
    

    use

    entries.revert.each do |entry|
    

    so, parsing will start from the end of entries and latest news will be added in the end.