I'm currently scraping this link using Selenium. I want it to go to the "Meeting Rooms" section and click on "View details," then scrape the information from the page that opens. I've written this code, but the problem is that the "View details" link is defined as "#" and it just refers back to the main page!
driver = webdriver.Chrome()
driver.get('https://www.cvent.com/venues/dubai/hotel/grand-hyatt-dubai-conference-hotel/venue-f1ea54df-124e-4ba9-a551-fb2e06175c62')
#-------------------------------------------Scroll and Show All Btn
try:
scroll_height = driver.execute_script("return document.body.scrollHeight")
show_all_button = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
(By.CSS_SELECTOR, ".MeetingRoomsGrid__viewAllToggleContainer___2HJRU button")))
ActionChains(driver).move_to_element(show_all_button).click().perform()
time.sleep(5)
except (NoSuchElementException, StaleElementReferenceException,TimeoutException):
pass
title=[]
view_details=driver.find_elements(By.CSS_SELECTOR,'li.MeetingRoomsGrid__venueDetailWrapper___3XYYx a')
for l in view_details:
driver.get(l.get_attribute('href'))
title.append(l.find_element(By.CLASS_NAME,'MeetingRoomDetailPage__meetingRoomNameText___3k-3T').text)
print(title)
#rest of the code
How can I solve this problem?
Your link is not rendered inside a
tag, link is rendered inside one of scripts that is loaded during page render.
Link for rooms is builder via pattern {url}/meetingRoom-{roomId}
id
s can be found via parsing script
tag by pattern {"id":"([^"]+)"
So, your steps are:
innerText
propertybadges
object (so only rooms ids would be in first part of array){"id":"([^"]+)"
to extract ids{url}/meetingRoom-{match}
import re
import html
# your imports
url="https://www.cvent.com/venues/dubai/hotel/grand-hyatt-dubai-conference-hotel/venue-f1ea54df-124e-4ba9-a551-fb2e06175c62"
driver.get(url)
script = wait.until(EC.presence_of_element_located((By.XPATH, "//script[contains(.,'{"id"')]")))
pattern = r'{"id":"([^"]+)"'
script_text = html.unescape(script.get_property('innerText'))
matches = re.findall(pattern, script_text.split('"badges"')[0])
for match in matches:
meeting_room = f"{url}/meetingRoom-{match}"
print(meeting_room)