Search code examples
python-3.xselenium-chromedrivershadow-dom

how to find all shadow roots using Selenium Python


My ultimate goal is to dump the full html of any page. "Full" means the original source html and dynamic htmls. Chrome Devtool is already doing this but I need it in a programmatical way, in Selenium Python.

I can locate all iframes using xpath //iframe. I'd like to find a way to locate all shadow roots too. I have read some good Stack Overflow posts, like this one how-to-identify-shadow-dom. But they all assumed that the location of shadow root was already known, which is not my case.


Solution

  • Your question lacks a minimal reproducible example. Nonetheless (in hope that your next question will contain such example, and be up to SOF standards), here is one way of finding all shadow roots in a page:

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import Select
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.action_chains import ActionChains
    from selenium.webdriver.common.keys import Keys
    from selenium.common.exceptions import NoSuchShadowRootException
    
    import time as t
    import pandas as pd
    
    
    chrome_options = Options()
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument('disable-notifications')
    chrome_options.add_argument("window-size=1280,720")
    
    webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
    browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
    actions = ActionChains(browser)
    wait = WebDriverWait(browser, 20)
    url = 'https://iltacon2022.expofp.com/'
    browser.get(url) 
    
    all_elements = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//*')))
    for el in all_elements:
        try:
            if el.shadow_root:
                print('found shadow root in', el.get_attribute('outerHTML'))
        except NoSuchShadowRootException:
            print('no shaddow root')
    

    This is just one way hastily put together, to locate all eventual shadow roots in a page. The selenium setup is on linux/chromedriver. Note that for other browsers/drivers, like gecko/Firefox, you will need a different method to locate the shadow root. Lastly, Selenium docs can be found at https://www.selenium.dev/documentation/