Here is the layout of the XML file that I am parsing. Whenever an instance occurs when there is a tag like driverslicense with multiple values I am trying to parse them to get the name and text. i.e. {number: 99999999, state: CA}
""" > <subjects>
<subject id="B6">
<name type="primary">
<first>Frank </first>
<middle></middle>
<last>Darko</last>
</name>
<birthdate>10/26/2001</birthdate>
<age>17</age>
<ssn>12345679</ssn>
<description>
<sex>Male</sex>
</description>
<address type="residence" ref="A1"/>
<driverslicense state="CA" number="99999999"/>
</subject>
</subjects>"""
My code is as follows:
dl = bs_data.find("driverslicense")
Output:
<driverslicense number="T64430698" state="VA"/>
I tried do a for loop but then no value is returned as well as .text but this also returns none.
for i in bs_data.find('driverslicense'):
print(i)
------------------
DriverLicense = bs_data.find("driverslicense")
print(DriverLicense.text)
I prefer to get this in dictionary form but if I get this as independent variables like state = CA and number = 99999999 that would work as well.
Just in addition if you like to get a dict
with the attributes
and values of a tag
you could simply call .attrs
.
soup.select_one('driverslicense').attrs
Note: In this case it works like charm, in others where you have to pick only specific attributes, the approache from @platipus_on_fire would be ideal or you might have to ignore or drop additional ones
from bs4 import BeautifulSoup
html = '''
<subjects>
<subject id="B6">
<name type="primary">
<first>Frank </first>
<middle></middle>
<last>Darko</last>
</name>
<birthdate>10/26/2001</birthdate>
<age>17</age>
<ssn>12345679</ssn>
<description>
<sex>Male</sex>
</description>
<address type="residence" ref="A1"/>
<driverslicense state="CA" number="99999999"/>
</subject>
</subjects>
'''
soup.select_one('driverslicense').attrs
{'state': 'CA', 'number': '99999999'}