Search code examples
ruby-on-railsrubywatir

How do I isolate only certain text to be scraped using watir for ruby


trying to learn watir / ruby better and I am trying to write a code to go to ebay listings based on hockey players and search for cards but I only want to scrape the results of the search not the rest of the page. is there a way to sandwich it so I only scrape the text between "Save This Search" and "Results" eg.below

code:

require 'watir'
require 'webdrivers'


puts 'Enter a Name: '
name = gets.chomp

puts 'PSA, BGS, or RAW?'
grade = gets.chomp.downcase
if grade == 'raw'
    grade = ''
end   


browser = Watir::Browser.new
browser.goto('ebay.ca')

browser.wait(5) { browser.text_field.exists? }
browser.text_field.set "#{name} young guns #{grade}"
browser.send_keys :enter

puts browser.text 



sleep(3)

Save this search Shipping to: V1B2C7 2005-06 UPPER DECK #201 SIDNEY CROSBY YOUNG GUNS RC GRADED BGS 9.5 "GEM MINT" Brand New C $1,084.00 Time left 5d 5h left (Sun., 06:55 p.m.) 13 bids +C $12.99 shipping 2005-06 Upper Deck #201 Sidney Crosby Young Guns True Gem BGS 9.5 w/ 10 Sub Centering 9.5 Corners 9.5 Edges 9.5 Surface 10 C $2,199.95 Top Rated Seller Buy It Now +C $12.00 shipping 12 watchers 2005/06 Sidney Crosby Young Guns #201 New (Other) C $1,150.00 or Best Offer +C $20.99 shipping 22 watchers 2005-06 Upper Deck #201 Sidney Crosby YG RC Young Guns Please Read REPRINT C $9.99 Time left 16h 58m left (Wed., 06:18 a.m.) 1 bid Top Rated Seller +C $2.99 shipping Conner Mcdavid,Crosby,Matthews,Gretzky,Price,Young Guns Reprints Brand New C $12.50 Time left 3d 7h left (Fri., 09:01 p.m.) 5 bids +C $3.00 shipping 2005 Upper Deck Young Guns #201 Sidney Crosby RC Rookie Gem Mint PSA 10 Brand New C $1,761.76 Time left 2d 6h left (Thu., 07:41 p.m.) 17 bids Top Rated Seller +C $49.40 shipping From United States Customs services and international tracking provided 2005 Upper Deck Young Guns #201 Sidney Crosby RC Rookie Gem Mint PSA 10 Brand New C $1,829.52 Time left 2d 6h left (Thu., 07:40 p.m.) 11 bids Top Rated Seller +C $50.52 shipping From United States Customs services and international tracking provided Results Pagination - Page 1 12 Items Per Page50 Items Per Page


Solution

  • You can target the specific elements that you are interested in seeing. In this case, the search results are all of the li elements with class "sresult".

    Therefore, you could get all the text of the search results by doing:

    results = browser.lis(class: 'sresult')
    results.each { |r| puts r.text }
    

    This gives a pretty ugly blob of text without any of the information for what each piece of text is - eg title vs price. It may be better to focus on specific elements within each result to pull/format the exact information you want:

    results.each do |r|
      puts "Title: #{r.h3.text}"
      puts "Price: #{r.li(class: 'lvprice').text}"
      puts
    end