Currently, I am able to parse a website with Nokogiri and grab specific elements from pages. However, I need to be able to grab a specific string such as "Out of stock" that is visible to the user:
page.text.match('Out of stock')
That works fine for grabbing the correct string and returning true or false if the string is or isn't there, however, some links like the following, return true even if the item is not out of stock because that specific string is hidden in a script tag on the page:
https://www.walmart.com/ip/Funyuns-Onion-Flavored-Rings-6-oz/36915849?athcpid=36915849&athpgid=athenaItemPage&athcgid=null&athznid=PWSFM&athieid=v0&athstid=CS020&athguid=ba634528-888-172187cc96a580&athancid=null&athena=true
I am looking for a way so that that string is pulled if and only if it is visible to users so the above should return false for matching the "Out of stock" string, while the link below should return true (at time of posting), because the item is actually out of stock.
https://www.walmart.com/ip/4-Pack-Chesters-Flamin-Hot-Popcorn-4-25-oz/737202470?selected=true
I am also aware that I could grab the specific tag that contains the string, but I need to monitor hundreds of websites so the solution has to be a broad search for a visible string.
short answer: we can use xpath
syntax for this with more specific.
long story: I strongly recommend to put more specific with css-classes, coz, in some of the cases we can get this text not only in "script tag" but also by media query or in item-preview blocks or whatever, and handle common cases as big chunks, but not to force to use one specific solution for all cases, in case of unexpected behavior
so we need to be more specific and use the "target-tags" to handle it, for example:
Nokogiri::HTML.parse(page.html).xpath("//*[contains(@class, 'prod-PriceSection')]//*[contains(@class, 'prod-ProductOffer-oosMsg')]").text
"Out of stock"
so, "to monitor hundreds of websites" we can going with this approach:
xpath("//*[contains(@class, 'PriceSection')]").text
or even better to use something like this to be sure that element is surly visible:
page.all("//body//*[contains(text(), 'Out of stock')]", visible: true).count
# => 1
if the usage of one more request (in previous solution) by Capybara may become a problem, we can follow with this solution, it's much faster:
xpath("//body//*[not(self::script) and contains(text(), 'Out of stock')]").count
I hope it's help