I am trying to extract an xml attribute parsable-cite
from the text
tag. I am parsing an xml from the url "https://www.congress.gov/118/bills/hr61/BILLS-118hr61ih.xml".
The code I'm using is the following (Replit here https://replit.com/join/ohhztxpqdr-aam88) and writing here for convenience:
from lxml import etree
import requests
response = requests.get(url)
xml_response = response.content
tree = etree.fromstring(xml_response)
result = tree.xpath("//text[contains(., 'is amended')]")
for r in result:
external_xref = r.find("external-xref")
print(external_xref.attrib)
I get an error conveying that I'm accessing None
and that the xpath didn't find the search.
AttributeError: 'NoneType' object has no attribute 'attrib'
When I use the same code and instead use the snippet of the text node directly, I get the following:
text = b’<text display-inline="no-display-inline">Section 4702 of the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act (<external-xref legal-doc="usc" parsable-cite="usc/18/249">18 U.S.C. 249</external-xref> note) is amended by adding at the end the following: </text>’
tree = etree.fromstring(text)
result = tree.xpath("//text[contains(., 'is amended')]")
for r in result:
external_xref = r.find("external-xref")
print(external_xref.attrib)
{'legal-doc': 'usc', 'parsable-cite': 'usc/18/249'}
The issue seems to come from processing the content from the url directly. Any recommendations on how to proceed?
Thanks
In https://www.congress.gov/118/bills/hr61/BILLS-118hr61ih.xml, there are two text
elements that contain the string "is amended". But only one of them (the second one) has an external-xref
child element.
The following update of the code will produce the wanted output:
for r in result:
external_xref = r.find("external-xref")
if external_xref is not None: # Check if there actually is an external-xref
print(external_xref.attrib)