Search code examples
javascripthtmlrubycapybarapoltergeist

How to get dynamic page content with capybara


Trying to get text from elements:

<div class="points-text" data-reactid=".2765swfgy68.1.2.2.1:$sony-xperia-z1-compact.2.0.4.0.0.1">29,367 points</div>

I guess websites uses Reactjs and capybara can't get content even with Poltergeist driver. Is there any workaround?

Here is my code:

require 'rubygems'
require 'capybara'
require 'capybara/poltergeist'


Capybara.default_driver = :poltergeist
Capybara.register_driver :poltergeist do |app|
  Capybara::Poltergeist::Driver.new(app, {js_errors: false})
end


class WebScraper
  include Capybara::DSL

  def get_page_data(url)
    visit(url)
    doc = Nokogiri::HTML(page.html)
    p doc.css('.points-text')
  end
end


scraper = WebScraper.new
puts scraper.get_page_data('http://versus.com/en/sony-xperia-z1-compact')

Solution

  • There is no need to parse the html with Nokogiri if you're already visiting with Capybara.

    def get_page_data(url)
      visit(url)
      p find(:css, '.points-text').text      
    end
    

    will print the visible text in the element with class points-text