<img alt="MediaMarkt" border="0" e-editable="img" src="http://news-de.mediamarkt.de/custloads/298149669/vce/mediamarkt.png" style="display:block;" width="169"/>
I am trying to get the src from HTML, I have the alt
value then using it I try to get the image
company_name = "mediamarkt"
response.xpath(f'//img[lower-case(@alt)="{company_name.lower()}"]') #Error
response.xpath(f"//img[matches(@alt,'{company_name}','i')]") # Error
Error I am getting:
Traceback (most recent call last):
File "/home/timmy/.local/lib/python3.8/site-packages/parsel/selector.py", line 254, in xpath
result = xpathev(query, namespaces=nsp,
File "src/lxml/etree.pyx", line 1582, in lxml.etree._Element.xpath
File "src/lxml/xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Unregistered function
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<console>", line 1, in <module>
File "/home/timmy/.local/lib/python3.8/site-packages/scrapy/http/response/text.py", line 117, in xpath
return self.selector.xpath(query, **kwargs)
File "/home/timmy/.local/lib/python3.8/site-packages/parsel/selector.py", line 260, in xpath
six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/six.py", line 702, in reraise
raise value.with_traceback(tb)
File "/home/timmy/.local/lib/python3.8/site-packages/parsel/selector.py", line 254, in xpath
result = xpathev(query, namespaces=nsp,
File "src/lxml/etree.pyx", line 1582, in lxml.etree._Element.xpath
File "src/lxml/xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
ValueError: XPath error: Unregistered function in //img[matches(@alt,'mediamarkt','i')]
I got those XPath from case-insensitive matching in xpath?
Both lower-case()
and matches()
require XPath 2.0, but lxml only implements XPath 1.0.
The idiom in XPath 1.0 for case-insensitive matching uses translate()
,
translate(@alt, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
to map upper-case characters to lower-case before comparing against the lower-case version of the string for which a case-insensitive comparison is needed.
So, in your case,
response.xpath(f"//img[translate(@alt, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='{company_name.lower()}']")
and similarly for your other XPath.