python-3.x selenium-chromedriver shadow-dom

how to find all shadow roots using Selenium Python

My ultimate goal is to dump the full html of any page. "Full" means the original source html and dynamic htmls. Chrome Devtool is already doing this but I need it in a programmatical way, in Selenium Python.

I can locate all iframes using xpath //iframe. I'd like to find a way to locate all shadow roots too. I have read some good Stack Overflow posts, like this one how-to-identify-shadow-dom. But they all assumed that the location of shadow root was already known, which is not my case.

Solution

Your question lacks a minimal reproducible example. Nonetheless (in hope that your next question will contain such example, and be up to SOF standards), here is one way of finding all shadow roots in a page:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchShadowRootException

import time as t
import pandas as pd


chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
actions = ActionChains(browser)
wait = WebDriverWait(browser, 20)
url = 'https://iltacon2022.expofp.com/'
browser.get(url) 

all_elements = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//*')))
for el in all_elements:
    try:
        if el.shadow_root:
            print('found shadow root in', el.get_attribute('outerHTML'))
    except NoSuchShadowRootException:
        print('no shaddow root')

This is just one way hastily put together, to locate all eventual shadow roots in a page. The selenium setup is on linux/chromedriver. Note that for other browsers/drivers, like gecko/Firefox, you will need a different method to locate the shadow root. Lastly, Selenium docs can be found at https://www.selenium.dev/documentation/