Search code examples
javascriptajaxpython-3.xselenium-webdriverphantomjs

How do I scrape mouse hover-box content on an ajax webpage using Python


I have been trying to extract the dimension for each cell on a map from this ajax website, the details for each cell only pop-up when a mouse point on the cell.

I have used Python selenium webdriver and phantomjs to load and extract the page_source but the data wasn't found. I used firebug to look for any .json file that the content may be loading from but found none.

Please take a look at the site and suggest how I can scrape the content from the hover-box displaced when pointing on each cell on the map.

P.S: I have done a lot for research both on stackoverflow and several sites all to no avail.


Solution

  • There is no AJAX actually, but svg object that contain <g> element for each item (booth) on page. To get required info you have to perform mouse hovering over this <g>. With following code you could get most of item descriptions (about 2/3 of whole g elements number)... I don't know for sure what's context of page is about, so I cannot determine regularity of items appearance on page:

    from selenium import webdriver as web
    from selenium.webdriver.common.action_chains import ActionChains
    import time
    from selenium.webdriver.support.ui import WebDriverWait as wait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    
    driver = web.Chrome()
    driver.maximize_window()
    driver.get('http://www.aptaexpo.com/apta2017/public/eventmap.aspx?shmode=E&thumbnail=1')
    time.sleep(5)
    driver.find_elements_by_tag_name('polygon')[0].click() # [1] to choose another hall 
    time.sleep(5)
    
    list_of = driver.find_elements_by_xpath('//div[@class="leaflet-overlay-pane"]/*[name()="svg"]/*[name()="g"]')
    for item in list_of:
        action = ActionChains(driver)
        action.move_to_element(item)
        try:
            description = wait(driver, 3).until(EC.visibility_of_element_located((By.XPATH, '//div[*[contains(text(), "Booth:")]]'))).text
            print(description)
            action.perform()
        except:
            action.perform()