I'm not getting the expected value returned from the below code.
from playwright.sync_api import sync_playwright
import time
import random
def main():
with sync_playwright() as p:
browser = p.firefox.launch(headless=False)
page = browser.new_page()
url = "https://www.useragentlist.net/"
page.goto(url)
time.sleep(random.uniform(2,4))
test = page.locator('xpath=//span[1][@class="copy-the-code-wrap copy-the-code-style-button copy-the-code-inside-wrap"]/pre/code/strong').inner_text()
print(test)
count = page.locator('xpath=//span["copy-the-code-wrap copy-the-code-style-button copy-the-code-inside-wrap"]/pre/code/strong').count()
print(count)
browser.close()
if __name__ == '__main__':
main()
page.locator().count() returns a value of 0, I have no issue getting the text from the lines above it, but I need to access all elements, what is wrong with my implementation of locator and count?
Your second locator XPath has no @class=
, so it's different than the first one that works. Store the string in a variable so you don't have to type it twice or encounter copy-paste or stale data errors.
In any case, your approach seems overcomplicated. Each user agent is in a <code>
tag--just scrape that:
from playwright.sync_api import sync_playwright # 1.44.0
def main():
with sync_playwright() as p:
browser = p.firefox.launch()
page = browser.new_page()
url = "https://www.useragentlist.net/"
page.goto(url, wait_until="domcontentloaded")
agents = page.locator("code").all_text_contents()
print(agents)
browser.close()
if __name__ == "__main__":
main()
Locators auto-wait so there's no need to sleep. Avoid XPaths 99% of the time--they're brittle and difficult to read and maintain. Just use CSS selectors or user-visible locators. The goal is to choose the simplest selector necessary to disambiguate the elements you want, and nothing more. span/pre/code/strong
is a rigid hierarchy--if one of these changes, your code breaks unnecessarily.
By the way, the user agents are in the static HTML, so unless you're trying to circumvent a block, you can do this faster with requests and Beautiful Soup:
from requests import get # 2.31.0
from bs4 import BeautifulSoup # 4.10.0
response = get("https://www.useragentlist.net")
response.raise_for_status()
print([x.text for x in BeautifulSoup(response.text, "lxml").select("code")])
Better still (possibly), use a library like fake_useragent
to generate your random user agent.