Search code examples
pythonpython-3.xseleniumweb-scrapingiframe

selenium python iframe get https requests


I'm using selenium with python, I want to get all https requests from iframe element. here I get the iframe element and after I select a row from table and press button, http post request will start.

part of my code

define chrome driver

 chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument('--auto-open-devtools-for-tabs')
chrome_options.add_argument('--log-level=2')
chrome_options.add_argument('--disable-features=IsolateOrigins,site-per-process')
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])


capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL",'browser':'ALL','server':'ALL'}  # chromedriver 75+
capabilities["goog:chromeOptions"] = {"w3c": "false","args": "%w[headless window-size=1280,800]"}  # chromedriver 75+
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
capabilities['PageLoadStrategy'] = None

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options,desired_capabilities=capabilities)
driver.get(os.environ['URL'])

get iframe element and click on row table

  WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[@id="myiframe"]')))
row_selector = '//*[@id="root"]/div/div/div/div[2]/div[2]/div/div/div/div/table/tbody/tr[1]/th[3]'
row_selector_clickable = '//*[@id="root"]/div/div/div/div[2]/div[2]/div/div/div/div/table/tbody/tr[1]/th[3]/div/div/div[2]/div/button'

WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, row_selector)))
actions = ActionChains(driver)
row_element = driver.find_element(By.XPATH, row_selector)
row_clickable = driver.find_element(By.XPATH, row_selector_clickable)
actions.move_to_element(row_element)
actions.click(row_clickable)
actions.perform()

then here, I get all http post requests and write them to file

 logs = driver.get_log("performance")
def process_browser_logs_for_network_events(logs):
    """
 Return only logs which have a method that start with "Network.response", "Network.request", or "Network.webSocket"
 since we're interested in the network events specifically.
 """
    for entry in logs:
         log = json.loads(entry["message"])["message"]
         yield log


events = process_browser_logs_for_network_events(logs)
li = []
with open(f"log_entries-{datetime.datetime.now()}.txt", "wt") as out:
    for event in events:
        print(event)
        if 'method' in event.get('params', {}).get('request', {}) and event.get('params', {}).get('request',
                                                                                                  {}).get('method',
                                                                                                          '') == 'POST':
            li.append(event)


    out.write(json.dumps(li))

but the issue is that it shows me requests from the first page I guess, even If I switch to iframe and it select me the right elements from iframe. the flow is this way: I make login to website then I redirect to main page and then I click on button and it open new tab and there I have the iframe, I switch the iframe, press on row on table and there is http request that take 5-10 seconds (in this time is pending status) when it success it make redirect to gmail website and the http request is disappeared because the redirect so I tried to add preserve logs but still.

I can't expose the https requests because it's of my job, but what I'm seeing is requests from the first page and not from the current iframe..


Solution

  • Ok, I'll try to be as clear as I can: selenium setup below is linux/selenium/chrome, you can adapt it to your own, just observe the imports, and the code after defining the browser/driver.

    For intercepting browser's requests I used selenium-wire: https://pypi.org/project/selenium-wire/

    If you prefer, you can use native selenium request intercepting.

    I looked around for an example website containing an iframe with which you interact with, i.e. click a button (OP should have made the legwork and provide such example, but anyway).

    Code:

    from seleniumwire import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.keys import Keys
    import time as t
    from datetime import datetime
    
    chrome_options = Options()
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("window-size=1280,720")
    webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
    browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
    
    url = 'https://fasecolda.com/ramos/automoviles/historial-de-accidentes-de-vehiculos-asegurados/'
    browser.get(url)
    for x in browser.requests:
        print('URL:', x.url)
        print('ORIGIN:', x.headers['origin'])
        print('HOST:', x.headers['Host'])
        print('SEC-FETCH-DEST:', x.headers['sec-fetch-dest'])
        print('TIMESTAMP:', x.date)
        print('______________________________________')
    
    WebDriverWait(browser, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "//*[@title='Términos']")))
    t.sleep(3)
    print('switched to iframe')
    button = WebDriverWait(browser,5).until(EC.element_to_be_clickable((By.XPATH, '//*[text()="Acepto los Términos y Condiciones"]')))
    print('located the button, bringing it into view')
    button.location_once_scrolled_into_view
    print('now waiting 30 seconds, for clear separation of requests')
    t.sleep(30)
    print('printing last request again:')
    print('URL:', browser.last_request.url)
    print('ORIGIN:', browser.last_request.headers['origin'])
    print('HOST:', browser.last_request.headers['Host'])
    print('SEC-FETCH-DEST:', browser.last_request.headers['sec-fetch-dest'])
    print('TIMESTAMP:', browser.last_request.date)
    last_date = browser.last_request.date
    
    print('clicked the button at', datetime.now())
    button.click()
    print('waiting another 30 seconds')
    t.sleep(30)
    print('and now printing the requests again (the ones after interacting with iframe)')
    for x in browser.requests:
        if x.date > last_date:
            print('URL:', x.url)
            print('ORIGIN:', x.headers['origin'])
            print('HOST:', x.headers['Host'])
            print('SEC-FETCH-DEST:', x.headers['sec-fetch-dest'])
            print('TIMESTAMP:', x.date)
            print('______________________________________')
    

    As you can see, it's pretty straightforward:

    • go to website
    • print requests made (url, origin, host, sec-fetch-dest and timestamp)
    • locate the iframe and switch to it
    • locate the button you want to click on, and bring it into view
    • waiting 30 seconds, for any eventual requests made by JS in page
    • after 30 seconds, printing last request made (url, origin, host, sec-fetch-dest and timestamp) - also saving the timestamp into a variable, to be able to filter subsequent requests
    • clicking the button and registering the timestamp when we clicked it
    • waiting another 30 seconds, just to make sure all requests were performed
    • printing the requests made after the timestamp variable saved previously

    The result in terminal:

    [...]
    ______________________________________
    URL: https://fonts.gstatic.com/s/roboto/v30/KFOlCnqEu92Fr1MmEU9fBBc4.woff2
    ORIGIN: https://siniestroshava.com.co
    HOST: None
    SEC-FETCH-DEST: font
    TIMESTAMP: 2022-10-08 21:44:44.794670
    ______________________________________
    switched to iframe
    located the button, bringing it into view
    now waiting 30 seconds, for clear separation of requests
    printing last request again:
    URL: https://optimizationguide-pa.googleapis.com/v1:GetModels?key=AIzaSyCkfPOPZXDKNn8hhgu3JrA62wIgC93d44k
    ORIGIN: None
    HOST: None
    SEC-FETCH-DEST: empty
    TIMESTAMP: 2022-10-08 21:44:57.413952
    clicked the button at 2022-10-08 21:45:19.036690
    waiting another 30 seconds
    and now printing the requests again (the ones after interacting with iframe)
    URL: https://siniestroshava.com.co/hava/Seguridad/SolicitarCorreo
    ORIGIN: None
    HOST: siniestroshava.com.co
    SEC-FETCH-DEST: iframe
    TIMESTAMP: 2022-10-08 21:45:19.209288
    ______________________________________
    URL: https://siniestroshava.com.co/hava/css/hava/estiloslocales.css
    ORIGIN: None
    HOST: siniestroshava.com.co
    SEC-FETCH-DEST: style
    TIMESTAMP: 2022-10-08 21:45:19.633076
    ______________________________________
    URL: https://siniestroshava.com.co/hava/css/vendor.css?v=U1BT8Ls9ntdpDS12L5xpMjmSP3Eitncl_SyDnU5LLHk
    ORIGIN: None
    HOST: siniestroshava.com.co
    SEC-FETCH-DEST: style
    TIMESTAMP: 2022-10-08 21:45:19.645382
    ______________________________________
    URL: https://siniestroshava.com.co/hava/css/devextreme/dx.material.hava.css
    ORIGIN: None
    HOST: siniestroshava.com.co
    SEC-FETCH-DEST: style
    TIMESTAMP: 2022-10-08 21:45:19.646197
    ______________________________________
       [...]
        ______________________________________
    

    ​ As you can see, the main website is https://fasecolda.com, and iframe src is https://siniestroshava.com.co/. You can clearly observe all the requests made since loading the original page (I didn't post them all, too many), you can see the last request made before interacting with the iframe, the timestamp of interacting with iframe, and the subsequent request - the first one made having SEC-FETCH-DEST: iframe - obviously the request made by the iframe, due to us clicking the button. Also host and origin are relevant header keys, if they are present.

    This is a method to isolate the requests made from the iframe, as opposed to the ones made from main page.

    I believe this should answer your question as asked.