I'm new to Python, so apologies in advance for any idiocy.
I'm scraping information from a website, and am extracting elements using .extract_first().
What I wanted the output to be was just the text of the element, ie 'Bob Smith'. But instead, it seems like the xpath is being printed around the name:
Relevant code:
sel = Selector(text=driver.page_source)
name = sel.xpath('//li[@class="inline t-24 t-black t-normal break-words"]').extract_first()
if name:
name = name.strip()
print(name)
Output:
'<li class="inline t-24 t-black t-normal break-words">\n Bob Smith\n </li>'
I tried finding a solution online, but haven't found one that deals with this issue in the context of extract_first() . How do I get rid of the xpath so the output being printed is just the element text? Thanks.
Try using
name = sel.xpath('normalize-space(//li[@class="inline t-24 t-black t-normal break-words"])').extract_first()
My output from the html in your question:
Bob Smith