Search code examples
pythonxpathstrip

How to remove xpath from extract_first() output?


I'm new to Python, so apologies in advance for any idiocy.

I'm scraping information from a website, and am extracting elements using .extract_first().

What I wanted the output to be was just the text of the element, ie 'Bob Smith'. But instead, it seems like the xpath is being printed around the name:

Relevant code:

sel = Selector(text=driver.page_source)
name = sel.xpath('//li[@class="inline t-24 t-black t-normal break-words"]').extract_first()
if name:
     name = name.strip() 
print(name)

Output:

'<li class="inline t-24 t-black t-normal break-words">\n            Bob Smith\n          </li>'

I tried finding a solution online, but haven't found one that deals with this issue in the context of extract_first() . How do I get rid of the xpath so the output being printed is just the element text? Thanks.


Solution

  • Try using

    name = sel.xpath('normalize-space(//li[@class="inline t-24 t-black t-normal break-words"])').extract_first()
    

    My output from the html in your question:

    Bob Smith