there is an HTML tag =
<nav-categories id="MainMenu" :json-data="{some data}">text</nav-categories>
I need to pick up the contents ":json-data"
standard methods (response.css('::attr(":json-data")')
or response.css('::attr("\:json-data")')
) do not lead to success...
I use Python + Scrapy (response.selector)
Scrapy depends on lxml so lxml
was used in the answer instead of scrapy.
XPath does not allow a colon on an expression but is able to evaluate de element/attribute name.
>>> tree.xpath('//nav-categories/@:json-data')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/lxml/etree.pyx", line 2314, in lxml.etree._ElementTree.xpath
File "src/lxml/xpath.pxi", line 357, in lxml.etree.XPathDocumentEvaluator.__call__
File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Invalid expression
Using name()
XPath function as a workaround:
>>> from lxml import html
>>> tree = html.parse(r'/home/lmc/tmp/test.html')
>>> tree.xpath('//nav-categories/@*[name()=":json-data"]')
['{some data}']