Search code examples
pythonplaywrightplaywright-python

how do I select element in nest html by playwright


I want to extract text from the html below,I tried different way to do, but it still fail.page_id、article_id is random.I want to get a list of text.

html:

<div id=ufi_{page_id}>
  <div>
    <div></div>
    <div></div>
    <div></div>
    <div></div>    
    <div>
      <div id={article_id}>
          <div></div>
          <div>I want to get the text here</div>
          <div></div>
      </div>
      <div id={article_id2}>
          <div></div>
          <div>I want to get the text here</div>
          <div></div>
      </div>
      <div id={article_id3}>
          <div></div>
          <div>I want to get the text here</div>
          <div></div>
      </div>
    </div>
  </div>
</div>

code:

comments = page2.query_selector(f'xpath=//div[@id="ufi_{page_id}"]>>div>>//div[5]')
comments_ls = comments.query_selector_all("div>>//div[1]")
if comments:
    for com in comments_ls:
        print(com.text_content())

Solution

  • I'd suggest to use the Playwright codegen to let it generate selectors for you: https://playwright.dev/docs/cli#generate-code

    And use Locators instead of ElementHandles, they provide easy utility methods like .nth(42), .first, .last and automatically wait for an element to appear with the given selector. See here: https://playwright.dev/python/docs/api/class-locator

    For more information about selectors see here: https://playwright.dev/docs/selectors