Search code examples
pythonweb-scrapingplaywrightplaywright-python

How to open a new tab using Python Playwright by feeding it a list of URLs?


According to the Playwright documentation, the way to open a new tab in the browser is as shown in the scrape_post_info() function? However, it failed to do so.

What I am currently trying to do now is to loop through each URL within the posts list variable and then open up the link or URL in the new tab to scrape the post details. After done scraping a post, the tab then will be closed and continue to open up the next link in a new tab to scrape the post details again until it reaches the last URL in the posts list variable.

# Loop through each URL from the `posts` list variable that contains many posts' URLs
for post in posts:
    scrape_post_info(context, post)

def scrape_post_info(context, post):

    with context.expect_page() as new_page_info:
        page.click('a[target="_blank"]')  # Opens a new tab
    new_page = new_page_info.value

    new_page.wait_for_load_state()
    print(new_page.title())

Solution

  • Doing something similar for a project of mine, this is how I would do it.

    from playwright.sync_api import sync_playwright
    
    posts = ['https://playwright.dev/','https://playwright.dev/python/',]
    
    def scrape_post_info(context, post):
        page = context.new_page()
        page.goto(post)
        print(page.title())
        # do whatever scraping you need to
        page.close()
    
    with sync_playwright() as p:
        browser = p.chromium.launch()
        context = browser.new_context()
        for post in posts:
            scrape_post_info(context, post)
            # some time delay
    
    browser.close()
    

    Thing is the code snippet from the playwright docs is more about opening a new page after clicking a link on an existing page. Since you already have the urls ready, you can just visit each page one by one, and do your scraping.