Search code examples
c#htmlselenium-webdriverweb-scrapinghtml-agility-pack

How to obtain table data from a website that is hidden using selenium and c#?


I'm trying to scrape the following website and extract the table data of products using selenium in c# but when I want to parse the HTML result, I can't find the table. It seeems the table is loaded by Javascript/AJAX after the page loads. How can I extract the table and its number of rows?

URL: www.ifm.com/de/en/category/200_010_010_010

var options = new ChromeOptions()
{
    BinaryLocation = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe",

};
options.AddArguments(new List<string>() { "headless", "disable-gpu" });
string response = "";
options.AddArgument("no-sandbox");
using (var browser = new ChromeDriver(options))
{
    browser.Navigate().GoToUrl(url);
    WebDriverWait wait = new WebDriverWait(browser, TimeSpan.FromSeconds(20));
    ///
    /// *Both below expresions return null*
    //IWebElement rows_count = browser.FindElement(By.XPath("ifm-selector__matching-products"));
    //IWebElement next_button = browser.FindElement(By.XPath("ifm-pagination__cta normalize   hover-         link-2"));
 response= browser.PageSource;
} 
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(response);
var rows_count = htmlDoc.DocumentNode.SelectSingleNode("//div[@class='ifm-  selector__results']//div[@class='ifm-selector__matching-products']//span");

Solution

  • You can wait for the element to be available in the dom, see for example this answer on how to do that:

    https://stackoverflow.com/a/74930503/4122889

    You can use the following extensions or use the code inside.

    internal static class WebDriverExtensions
    {
        public static IWebElement FindElement(this ChromeDriver driver, By by, TimeSpan timeout)
            => FindElement((IWebDriver)driver, by, timeout);
    
        public static IWebElement FindElement(this IWebDriver driver, By by, TimeSpan timeout, TimeSpan pollingInterval = default)
        {
            // NOTE Also see: https://www.selenium.dev/documentation/webdriver/waits/
    
            var webDriverWait = new WebDriverWait(driver, timeout)
            {
                // Will default to the DefaultWait polling interval of selenium which is as of writing half a second
                PollingInterval = pollingInterval
            };
    
            // We're polling the dom, so this is normal procedure and not an exception.
            webDriverWait.IgnoreExceptionTypes(typeof(NoSuchElementException));
    
            return webDriverWait
                .Until(drv => drv.FindElement(@by));
        }
    }
    

    Then i'd use ifm-result-item as css class selector, that should give you a list of all html elements with their values:

    <div class="ifm-result-item">
       <div class="ifm-result-item__product-info">
          <button class="ifm-result-item__toggle hide-md- normalize" aria-expanded="false" data-test="ifm-result-item-toggle">
             <svg viewBox="0 0 24 24" class="ifm-result-item__toggle-icon inline-icon" aria-hidden="true">
                <use href="#chevron-d" class="icon-svg--fat"></use>
             </svg>
          </button>
          <div class="ifm-result-item__product-info-inner">
             <a href="/de/en/product/IEW200" class="ifm-result-item__product-link-wrapper" data-test="ifm-result-item-link">
                <span class="ifm-result-item__image">
                   <div class="ifm-product-thumbnail"><img srcset="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B140-FJPG/IEW200 2x" src="https://media.ifm.com/CIP/mediadelivery/rendition/8a35b3315fc8554e841a2d35b48bfdae/-B70-FJPG/IEW200" class="ifm-product-thumbnail__img" loading="lazy" style=""></div>
                </span>
                <div>
                   <span class="ifm-result-item__product-link">IEW200</span>
                   <div class="ifm-result-item__product-description hide-lg+">Inductive sensor</div>
                </div>
             </a>
             <div class="ifm-labeled-value-section ifm-result-item__product-info-details">
                <div class="ifm-labeled-value-section__entry">
                   <div class="ifm-labeled-value-section__label hyphens">Dimensions</div>
                   <!---->
                   <div class="ifm-labeled-value-section__value hyphens">M8 x 1 / L = 40 mm</div>
                </div>
                <div class="ifm-labeled-value-section__entry">
                   <div class="ifm-labeled-value-section__label hyphens">Sensing range</div>
                   <!---->
                   <div class="ifm-labeled-value-section__value hyphens">3 mm flush mountable</div>
                </div>
                <div class="ifm-labeled-value-section__entry">
                   <div class="ifm-labeled-value-section__label hyphens">Output function</div>
                   <!---->
                   <div class="ifm-labeled-value-section__value hyphens">normally open</div>
                </div>
                <div class="ifm-labeled-value-section__entry">
                   <div class="ifm-labeled-value-section__label hyphens">Output</div>
                   <!---->
                   <div class="ifm-labeled-value-section__value hyphens">DC PNP</div>
                </div>
                <div class="ifm-labeled-value-section__entry">
                   <div class="ifm-labeled-value-section__label hyphens">Connection</div>
                   <!---->
                   <div class="ifm-labeled-value-section__value hyphens">M8 Connector</div>
                </div>
             </div>
          </div>
       </div>
       <div class="ifm-result-item__expandable-functions" style="display: none;">
          <hr class="ifm-result-item__separator hr">
          <div class="ifm-expandable-functions ifm-result-item__collapsed-details">
             <div class="ifm-expandable-functions__item">
                <div class="ifm-product-price">
                   <div class="ifm-product-price__list-price ifm-list-price"><span class="ifm-list-price__label">List price:</span><span class="ifm-list-price__value" data-test="ifm-list-price">55,40 €</span></div>
                   <div class="ifm-product-price__individual-price ifm-individual-price"><span class="ifm-individual-price__label">Your price:</span><button type="button" class="ifm-individual-price__show-price hover-link-2 normalize" data-test="ifm-show-price">Please log in</button></div>
                </div>
             </div>
             <div class="ifm-add-to-cart ifm-expandable-functions__item ifm-expandable-functions__cart-items">
                <label class="ifm-add-to-cart__input ifm-input-label">
                   <div class="ifm-quantity-input" data-test="ifm-add-to-cart-input">
                      <div class="ifm-quantity-input__minus"><input type="button" class="normalize" data-field="quantity" value="-"></div>
                      <input step="1" min="1" max="9999" type="number" maxlength="4" name="quantity" class="normalize ifm-quantity-input__input-field">
                      <div class="ifm-quantity-input__plus"><input type="button" class="normalize" data-field="quantity" value="+"></div>
                   </div>
                </label>
                <button class="ifm-add-to-cart__button ifm-button normalize" data-test="ifm-add-to-cart-button">Add to the shopping basket</button>
             </div>
             <div class="ifm-expandable-functions__shop-items">
                <button class="ifm-wishlist hover-link-2 normalize ifm-expandable-functions__shop-item" data-test="ifm-wishlist-button">
                   <svg viewBox="0 0 24 24" aria-hidden="true" class="inline-icon">
                      <use href="#heart" class="icon-svg--thin"></use>
                   </svg>
                   <span class="hide-lg-">Save for later</span>
                </button>
                <button class="normalize ifm-compare-products hover-link-2 ifm-expandable-functions__shop-item hide-md-" data-test="ifm-compare-products-button">
                   <svg viewBox="0 0 1792 1792" class="inline-icon">
                      <use href="#compress"></use>
                   </svg>
                   <span class="hide-lg-">Compare</span>
                </button>
             </div>
          </div>
       </div>
    </div>
    

    For this particular site/page you could also just use the javascript api; https://www.ifm.com/restservices/de/en/category/200_010_010_010/productsAndAttributes