Search code examples
pythonhtmlxmlxpath

Python ValueError: XPath error: Unregistered function


<img alt="MediaMarkt" border="0" e-editable="img" src="http://news-de.mediamarkt.de/custloads/298149669/vce/mediamarkt.png" style="display:block;" width="169"/>

I am trying to get the src from HTML, I have the alt value then using it I try to get the image

company_name = "mediamarkt"
response.xpath(f'//img[lower-case(@alt)="{company_name.lower()}"]') #Error
response.xpath(f"//img[matches(@alt,'{company_name}','i')]") # Error

Error I am getting:

Traceback (most recent call last):
  File "/home/timmy/.local/lib/python3.8/site-packages/parsel/selector.py", line 254, in xpath
    result = xpathev(query, namespaces=nsp,
  File "src/lxml/etree.pyx", line 1582, in lxml.etree._Element.xpath
  File "src/lxml/xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
  File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Unregistered function

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/home/timmy/.local/lib/python3.8/site-packages/scrapy/http/response/text.py", line 117, in xpath
    return self.selector.xpath(query, **kwargs)
  File "/home/timmy/.local/lib/python3.8/site-packages/parsel/selector.py", line 260, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/six.py", line 702, in reraise
    raise value.with_traceback(tb)
  File "/home/timmy/.local/lib/python3.8/site-packages/parsel/selector.py", line 254, in xpath
    result = xpathev(query, namespaces=nsp,
  File "src/lxml/etree.pyx", line 1582, in lxml.etree._Element.xpath
  File "src/lxml/xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
  File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
ValueError: XPath error: Unregistered function in //img[matches(@alt,'mediamarkt','i')]

I got those XPath from case-insensitive matching in xpath?


Solution

  • Both lower-case() and matches() require XPath 2.0, but lxml only implements XPath 1.0.

    The idiom in XPath 1.0 for case-insensitive matching uses translate(),

    translate(@alt, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
    

    to map upper-case characters to lower-case before comparing against the lower-case version of the string for which a case-insensitive comparison is needed.

    So, in your case,

    response.xpath(f"//img[translate(@alt, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='{company_name.lower()}']")
    

    and similarly for your other XPath.

    See also case insensitive xpath contains() possible?