Get the Element Name From Attribute Value Using Xpath

I am trying to get the element/tag name of each node where I have a particular attribute value.

I have an xml:

<a node='1'>This</a>
<b node='2'>Is</b>
<c node='23'>A</c>
<d selector='g'>Loud</d>
<e node='4'>Dog</e>

I have a list of nodes of info I want to collect called nodes.

I select the text from these nodes with:

for node in nodes:
   get_text = response.xpath(f'//*[@node="{node}"]//text()').extract()

And I also want the names of the elements of the nodes. However, when I use this line within the same for-loop:

get_name = response.xpath(f'//*[@node="{node}"]/name()').get()

I get error:

ValueError: XPath error: Invalid expression

I have tried many variations, but am unable to get the element/tag names of each node.

Solution

The best way that I know how to get the names of the element tags is to use scrapy built in regex method re.

The pattern i typicall use is r'<(\w+)\s'.

Here is an example:

scrapy shell

In [1]: markup = """<html><a node='1'>This</a>
   ...: <b node='2'>Is</b>
   ...: <c node='23'>A</c>
   ...: <d selector='g'>Loud</d>
   ...: <e node='4'>Dog</e></html>"""

In [2]: sel = scrapy.Selector(text=markup)

In [3]: sel.xpath('//*[@node]').re('<(\w+)\s')
Out[3]: ['a', 'b', 'c', 'e']

In the above example I take the markup from your the example you provided and wrap it in a parent tag.
I then use that to create a scrapy selector object.
then I run an xpath query to get all elements that have the node attribute
then use the .re method to search for the regex pattern to find the element tag name.
the output is a list of all the element tag names that contain the node attribute.