Search code examples
c#web-scrapingxpathweb-crawlerhtml-agility-pack

C# Best Buy Web Scraping - Can't get add to cart element


I'm writing a simple web scraping application to retrieve information on certain PC components.

I'm using Best Buy as my test website and I'm using the HTMLAgilityPack as my scraper.

I'm able to retrieve the title and the price; however, I can't seem to get the availability.

So, I'm trying to read the Add to Cart button element's text. If it's available, it'll read "Add to Cart", otherwise, it'll read "Unavailable".

But, when I get the XPath and try to save it to a variable, it returns null. Can someone please help me out?

Here's my code.

var url = "https://www.bestbuy.com/site/pny-nvidia-geforce-gt-710-verto-2gb-ddr3-pci-express-2-0-graphics-card-black/5092306.p?skuId=5092306";
HtmlWeb web = new HtmlWeb();
HtmlDocument pageDocument = web.Load(url);

string titleXPath = "/html/body/div[3]/main/div[2]/div[3]/div[1]/div[1]/div/div/div[1]/h1";
string priceXPath = "/html/body/div[3]/main/div[2]/div[3]/div[2]/div/div/div[1]/div/div/div/div/div[2]/div/div/div/span[1]";
string availabilityXPath = "/html/body/div[3]/main/div[2]/div[3]/div[2]/div/div/div[7]/div[1]/div/div/div[1]/button";

var title = pageDocument.DocumentNode.SelectSingleNode(titleXPath);
var price = pageDocument.DocumentNode.SelectSingleNode(priceXPath);
bool availability = pageDocument.DocumentNode.SelectSingleNode(availabilityXPath) != null ? true : false;

Console.WriteLine(title.InnerText);
Console.WriteLine(price.InnerText);
Console.WriteLine(availability);

It correctly outputs the title and price, but availability is always null.


Solution

  • Try string availabilityXPath = "//button[. = 'Add to Cart']"

    In web scraping, while a long generated xpath will always work on the same static page, when you're dealing with multiple pages across the same store, the location of certain elements can drift and break your xpaths. Yours is breaking at /html/body/div[3]/main/div[2]/div[3]/div[2]/div/div/div[7]/div[1]/div and I suspect that's what's happening here.

    Learning to write one from scratch will be invaluable (and much easier to debug!).