Search code examples
pythoncachingplaywrightplaywright-python

How to cache playwright-python contexts for testing?


I am doing some web scraping using playwright-python>=1.41, and have to launch the browser in a headed mode (e.g. launch(headless=False).

For CI testing, I would like to somehow cache the headed interactions with Chromium, to enable offline testing:

  • First invocation: uses Chromium to make real-world HTTP transactions
  • Later invocations: uses Chromium, but all HTTP transactions read from a cache

How can this be done? I can't find any clear answers on how to do this.


Solution

  • It might solve your problem using HAR-file recording:

    1. Run the first test while recording a HAR-file
    2. Storing the HAR-file as an artifact, in your repo or similar in your CI environment
    3. Running test again with recorded HAR-file

    Here is how to do that with playwright==1.41.1 and pytest-playwright==0.3.3:

    import pathlib
    
    import pytest
    from playwright.sync_api import Browser, Playwright
    
    CACHE_DIR = pathlib.Path(__file__).parent / "cache"
    
    
    @pytest.fixture(name="example_har", scope="session")
    def fixture_example_har(playwright: Playwright) -> pathlib.Path:
        har_file = CACHE_DIR / "example.har"
        with (
            playwright.chromium.launch(headless=False) as browser,
            browser.new_page() as page,
        ):
            page.route_from_har(har_file, url="*/**", update=True)
            page.goto("https://example.com/")
        return har_file
    
    
    def test_caching(browser: Browser, example_har: pathlib.Path) -> None:
        with browser.new_context(offline=True) as context:
            page = context.new_page()
            page.route_from_har(example_har, url="*/**")
            page.goto("https://example.com/")