Search code examples
selenium-webdriverselenium-chromedriver

How to get link from a tag if href or src attribute doesn't contain link


I'm currently scraping this link using Selenium. I want it to go to the "Meeting Rooms" section and click on "View details," then scrape the information from the page that opens. I've written this code, but the problem is that the "View details" link is defined as "#" and it just refers back to the main page!

driver = webdriver.Chrome()
driver.get('https://www.cvent.com/venues/dubai/hotel/grand-hyatt-dubai-conference-hotel/venue-f1ea54df-124e-4ba9-a551-fb2e06175c62')
#-------------------------------------------Scroll and Show All Btn
try:
    scroll_height = driver.execute_script("return document.body.scrollHeight")
    show_all_button = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
                            (By.CSS_SELECTOR, ".MeetingRoomsGrid__viewAllToggleContainer___2HJRU button")))
    ActionChains(driver).move_to_element(show_all_button).click().perform()
    time.sleep(5)
except (NoSuchElementException, StaleElementReferenceException,TimeoutException):
    pass
title=[]
view_details=driver.find_elements(By.CSS_SELECTOR,'li.MeetingRoomsGrid__venueDetailWrapper___3XYYx a')
for l in view_details:
    driver.get(l.get_attribute('href'))
    title.append(l.find_element(By.CLASS_NAME,'MeetingRoomDetailPage__meetingRoomNameText___3k-3T').text)
print(title)
#rest of the code

How can I solve this problem?


Solution

  • Your link is not rendered inside a tag, link is rendered inside one of scripts that is loaded during page render.

    Link for rooms is builder via pattern {url}/meetingRoom-{roomId}

    ids can be found via parsing script tag by pattern {"id":"([^"]+)"

    So, your steps are:

    1. Get needed script element
    2. Unescape script for easier readability
    3. Get it's innerText property
    4. Split script text via badges object (so only rooms ids would be in first part of array)
    5. Apply pattern {"id":"([^"]+)" to extract ids
    6. Construct url, using pattern {url}/meetingRoom-{match}
    import re
    import html
    # your imports
    
    url="https://www.cvent.com/venues/dubai/hotel/grand-hyatt-dubai-conference-hotel/venue-f1ea54df-124e-4ba9-a551-fb2e06175c62"
    driver.get(url)
    
    script = wait.until(EC.presence_of_element_located((By.XPATH, "//script[contains(.,'{"id"')]")))
    
    pattern = r'{"id":"([^"]+)"'
    script_text = html.unescape(script.get_property('innerText'))
    matches = re.findall(pattern, script_text.split('"badges"')[0])
    
    for match in matches:
        meeting_room = f"{url}/meetingRoom-{match}"
        print(meeting_room)