Search code examples
pythonhtmlweb-scrapingweb-deployment

Class name changes in html


I was web-scraping and kept getting a NoSuchElement error(once every 2 or 3 successfull attempts) even though i knew the element was always there,it was an e-commerce website and it was the element in which the name of the product was stored.And after a long period of frustration and 100 failed solutions i realized that the class name of that element sometimes changes,and i would like to know why.The html code is identitcal,just the class name is different.

That class change it's name


Solution

  • Sometimes elements with an apparently identical structure may change either to update of the website or to specific positioning of certain elements in the grid.

    For example I assume that the first card of a grid has a different h2 class due to the orange banner below, which may require a different padding value respect to other titles with other classes.

    If you're looking for the titles of the products you can identify an element using something like .card-section-mid h2 instead, bypassing the need of using a specific class name.

    Thus you'd need to do something like:

    title = card.find_element_by_css_selector('.card-section-mid h2')
    #card assuming you've already selected the card element
    

    The best solution in web-scraping is to look for an API instead.

    The second best solution is to find such strong selectors that will survive the test of time or the variability of the layout. In this case it seems the the first h2 contained in the div.card-section-mid always contain the title. Thus is a favorite target to select for web-scraping purposes.