Search code examples
pythonweb-scrapingxpathplaywrightplaywright-python

Get href link using python playwright


I am trying to extract the link inside a href but all I am finding it is the text inside the element

The website code is the following:

<div class="item-info-container ">
   <a href="/imovel/32600863/" role="heading" aria-level="2" class="item-link xh-highlight" 
   title="Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga">
   Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga
   </a>

And the code I am using is:

element_handle = page.locator('//div[@class="item-info-container "]//a').all_inner_texts()

No matter if I specify //a[@href] or not, my output is always the title text:

Apartamento T3 na avenida da Liberdade, São José de São Lázaro e São João do Souto, Braga

When what I really want to achieve is:

/imovel/32600863/

Any ideas of where my logic is failing me?


Solution

  • Using get_attribute:

    link = page.locator('.item-info-container ').get_by_role('link').get_attribute('href')
    

    More than one locator:

    link_locators = page.locator('.item-info-container ').get_by_role('link').all()
    for _ in link_locators:
        print(_.get_attribute('href'))