Search code examples
pythonseleniumxpathcss-selectorswebdriverwait

Python/Selenium web scrap how to find hidden src value from a links?


Scrapping links should be a simple feat, usually just grabbing the src value of the a tag.

I recently came across this website (https://sunteccity.com.sg/promotions) where the href value of a tags of each item cannot be found, but the redirection still works. I'm trying to figure out a way to grab the items and their corresponding links. My typical python selenium code looks something as such

all_items = bot.find_elements_by_class_name('thumb-img')
for promo in all_items:
    a = promo.find_elements_by_tag_name("a")
    print("a[0]: ", a[0].get_attribute("href"))

However, I can't seem to retrieve any href, onclick attributes, and I'm wondering if this is even possible. I noticed that I couldn't do a right-click, open link in new tab as well.

Are there any ways around getting the links of all these items?

Edit: Are there any ways to retrieve all the links of the items on the pages?

i.e.

https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
...

Edit: Adding an image of one such anchor tag for better clarity: enter image description here


Solution

  • By reverse-engineering the Javascript that takes you to the promotions pages (seen in https://sunteccity.com.sg/_nuxt/d4b648f.js) that gives you a way to get all the links, which are based on the HappeningID. You can verify by running this in the JS console, which gives you the first promotion:

    window.__NUXT__.state.Promotion.promotions[0].HappeningID
    

    Based on that, you can create a Python loop to get all the promotions:

    items = driver.execute_script("return window.__NUXT__.state.Promotion;")
    for item in items["promotions"]:
        base = "https://sunteccity.com.sg/promotions/"
        happening_id = str(item["HappeningID"])
        print(base + happening_id)
    

    That generated the following output:

    https://sunteccity.com.sg/promotions/724
    https://sunteccity.com.sg/promotions/731
    https://sunteccity.com.sg/promotions/751
    https://sunteccity.com.sg/promotions/752
    https://sunteccity.com.sg/promotions/754
    https://sunteccity.com.sg/promotions/280
    https://sunteccity.com.sg/promotions/764
    https://sunteccity.com.sg/promotions/766
    https://sunteccity.com.sg/promotions/762
    https://sunteccity.com.sg/promotions/767
    https://sunteccity.com.sg/promotions/732
    https://sunteccity.com.sg/promotions/733
    https://sunteccity.com.sg/promotions/735
    https://sunteccity.com.sg/promotions/736
    https://sunteccity.com.sg/promotions/737
    https://sunteccity.com.sg/promotions/738
    https://sunteccity.com.sg/promotions/739
    https://sunteccity.com.sg/promotions/740
    https://sunteccity.com.sg/promotions/741
    https://sunteccity.com.sg/promotions/742
    https://sunteccity.com.sg/promotions/743
    https://sunteccity.com.sg/promotions/744
    https://sunteccity.com.sg/promotions/745
    https://sunteccity.com.sg/promotions/746
    https://sunteccity.com.sg/promotions/747
    https://sunteccity.com.sg/promotions/748
    https://sunteccity.com.sg/promotions/749
    https://sunteccity.com.sg/promotions/750
    https://sunteccity.com.sg/promotions/753
    https://sunteccity.com.sg/promotions/755
    https://sunteccity.com.sg/promotions/756
    https://sunteccity.com.sg/promotions/757
    https://sunteccity.com.sg/promotions/758
    https://sunteccity.com.sg/promotions/759
    https://sunteccity.com.sg/promotions/760
    https://sunteccity.com.sg/promotions/761
    https://sunteccity.com.sg/promotions/763
    https://sunteccity.com.sg/promotions/765
    https://sunteccity.com.sg/promotions/730
    https://sunteccity.com.sg/promotions/734
    https://sunteccity.com.sg/promotions/623