Search code examples
pythonplaywrightplaywright-python

Python playwright locator not returning expected value


I'm not getting the expected value returned from the below code.

from playwright.sync_api import sync_playwright
import time
import random

def main():
    with sync_playwright() as p:
        browser = p.firefox.launch(headless=False)
        page = browser.new_page()
        url = "https://www.useragentlist.net/"
        page.goto(url)
        time.sleep(random.uniform(2,4))

        test = page.locator('xpath=//span[1][@class="copy-the-code-wrap copy-the-code-style-button copy-the-code-inside-wrap"]/pre/code/strong').inner_text()
        print(test)

        count = page.locator('xpath=//span["copy-the-code-wrap copy-the-code-style-button copy-the-code-inside-wrap"]/pre/code/strong').count()
        print(count)


        browser.close()


if __name__ == '__main__':
    main()

page.locator().count() returns a value of 0, I have no issue getting the text from the lines above it, but I need to access all elements, what is wrong with my implementation of locator and count?


Solution

  • Your second locator XPath has no @class=, so it's different than the first one that works. Store the string in a variable so you don't have to type it twice or encounter copy-paste or stale data errors.

    In any case, your approach seems overcomplicated. Each user agent is in a <code> tag--just scrape that:

    from playwright.sync_api import sync_playwright # 1.44.0
    
    
    def main():
        with sync_playwright() as p:
            browser = p.firefox.launch()
            page = browser.new_page()
            url = "https://www.useragentlist.net/"
            page.goto(url, wait_until="domcontentloaded")
            agents = page.locator("code").all_text_contents()
            print(agents)
            browser.close()
    
    
    if __name__ == "__main__":
        main()
    

    Locators auto-wait so there's no need to sleep. Avoid XPaths 99% of the time--they're brittle and difficult to read and maintain. Just use CSS selectors or user-visible locators. The goal is to choose the simplest selector necessary to disambiguate the elements you want, and nothing more. span/pre/code/strong is a rigid hierarchy--if one of these changes, your code breaks unnecessarily.

    By the way, the user agents are in the static HTML, so unless you're trying to circumvent a block, you can do this faster with requests and Beautiful Soup:

    from requests import get  # 2.31.0
    from bs4 import BeautifulSoup  # 4.10.0
    
    response = get("https://www.useragentlist.net")
    response.raise_for_status()
    print([x.text for x in BeautifulSoup(response.text, "lxml").select("code")])
    

    Better still (possibly), use a library like fake_useragent to generate your random user agent.