I am trying to get the element/tag name of each node where I have a particular attribute value.
I have an xml:
<a node='1'>This</a>
<b node='2'>Is</b>
<c node='23'>A</c>
<d selector='g'>Loud</d>
<e node='4'>Dog</e>
I have a list of nodes of info I want to collect called nodes
.
I select the text from these nodes with:
for node in nodes:
get_text = response.xpath(f'//*[@node="{node}"]//text()').extract()
And I also want the names of the elements of the nodes. However, when I use this line within the same for-loop:
get_name = response.xpath(f'//*[@node="{node}"]/name()').get()
I get error:
ValueError: XPath error: Invalid expression
I have tried many variations, but am unable to get the element/tag names of each node.
The best way that I know how to get the names of the element tags is to use scrapy built in regex method re
.
The pattern i typicall use is r'<(\w+)\s'
.
Here is an example:
scrapy shell
In [1]: markup = """<html><a node='1'>This</a>
...: <b node='2'>Is</b>
...: <c node='23'>A</c>
...: <d selector='g'>Loud</d>
...: <e node='4'>Dog</e></html>"""
In [2]: sel = scrapy.Selector(text=markup)
In [3]: sel.xpath('//*[@node]').re('<(\w+)\s')
Out[3]: ['a', 'b', 'c', 'e']
node
attribute.re
method to search for the regex pattern to find the element tag name.