Search code examples
pythonseleniumselenium-webdriverhttp-redirecthar

How to get redirect chain with selenium if I must directly click on banners and open it on another tab?


I am working on a project, in which I should click banners and get redirect chains.

In order not to get my page override and to make it easier for next step, I thought I should ctrl+click on banners and make it open in another browser tab to get all the real redirect chains.

I've researched a lot, but only found that present methods would dump HAR files to get redirect chains. But to get HAR files, the Network panel in Developer Tools window should be opened previously in a tab. BUT, in my case, a new tab could not open a Network panel before the tab is loaded; I can't open the Network panel and reload the page either because redirect chains would not be real. Additionally, the embedded performance log is not appliable in my case

Can anyone tell me how can I solve these problems? Or was I wrong about any part above? Any advice would be greatly appreciated since I really have been working on it for long.


Solution

  • In order to get the redirect chains, you'll need the HAR files.

    There are a few packages that combine selenium with other libraries to accomplish this (and other additions as well).

    One is browsermob-proxy.

    BrowserMob Proxy allows you to manipulate HTTP requests and responses, capture HTTP content, and export performance data as a HAR file. BMP works well as a standalone proxy server, but it is especially useful when embedded in Selenium tests.

    Here is an example:

    from browsermobproxy import Server
    server = Server("path/to/browsermob-proxy")
    server.start()
    proxy = server.create_proxy()
    
    from selenium import webdriver
    
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy)) #Configure chrome options
    driver = webdriver.Chrome(chrome_options=chrome_options)
    proxy.new_har("StackOverFlow") 
    driver.get("https://stackoverflow.com")
    print(proxy.har)
    

    There are other libraries such as selenium-wire that have similar capabilities (with other features as well).

    Note: no need to open the Network panel.

    Make sure to download the proxy and add the path to the initiation of the Server.