I am scheduling via the Whenever gem, however, it seems that my scraped results are not getting updated at all.
I think it's because it's being saved (i.e., the earlier results), so it's displaying those results only but I'm not sure.
Controller:
class EntriesController < ApplicationController
def index
@entries = Entry.all
end
def scrape
RedditScrapper.scrape
respond_to do |format|
format.html { redirect_to entries_url, notice: 'Entries were successfully scraped.' }
format.json { entriesArray.to_json }
end
end
end
lib/reddit_scrapper.rb:
require 'open-uri'
module RedditScrapper
def self.scrape
doc = Nokogiri::HTML(open("https://www.reddit.com/"))
entries = doc.css('.entry')
entriesArray = []
entries.each do |entry|
title = entry.css('p.title > a').text
link = entry.css('p.title > a')[0]['href']
entriesArray << Entry.new({ title: title, link: link })
end
if entriesArray.map(&:valid?)
entriesArray.map(&:save!)
end
end
end
config/schedule.rb:
RAILS_ROOT = File.expand_path(File.dirname(__FILE__) + '/')
every 2.minutes do
runner "RedditScrapper.scrape", :environment => "development"
end
model:
class Entry < ApplicationRecord
end
routes:
Rails.application.routes.draw do
#root 'entry#scrape_reddit'
root 'entries#index'
resources :entries
#get '/new_entries', to: 'entries#scrape', as: 'scrape'
end
View index.html.erb:
<h1>Reddit's Front Page</h1>
<% @entries.order("created_at DESC").limit(10).each do |entry| %>
<h3><%= entry.title %></h3>
<p><%= entry.link %></p>
<% end %>
Use just Entry.create!
to create an entry:
module RedditScraper
def self.scrape
doc = Nokogiri::HTML(open("https://www.reddit.com/"))
entries = doc.css('.entry')
entriesArray = []
entries.each do |entry|
title = entry.css('p.title > a').text
link = entry.css('p.title > a')[0]['href']
Entry.create!(title: title, link: link )
end
end
end
To get 10 latest entries:
# controller
def index
@entries = Entry.order("created_at DESC").limit(10)
end
view:
<% @entries.each do |entry| %>
But too think you need change the order of parsing items from Reddit for the latest in the top but you add it first to database. You need to make a change in the Reddit scraper .
Revert entries: instead of
entries.each do |entry|
use
entries.revert.each do |entry|
so, parsing will start from the end of entries and latest news will be added in the end.