Search code examples
pythonxmlxmltodict

Python xmltodict.parse returns Exception "not well-formed (invalid token): line 6, column 15


Using the following code snippet I am not sure how to resolve the bad element/attribute. It seems to be quoted correctly and is in proper utf-8 form (I believe). But the \x07 is tripping up the xmltodict.parse

Exception: not well-formed (invalid token): line 6, column 15

Any ideas how to strip these codepoints so it doesn't throw exceptions?

response = requests.get(dp_url, params=dp_params)

try:
    dict_response = xmltodict.parse(response.text)
except Exception as e:   ***not well-formed (invalid token): line 6, column 15***
    print(e)

The XML:

<result><record><field name='donor_id' id='donor_id' value='40362'/><field name='first_name' id='first_name' value='John'/><field name='org_rec' id='org_rec' value='N'/><field name='donor_type' id='donor_type' value='IN'/><field name='nomail' id='nomail' value='N'/><field name='nomail_reason' id='nomail_reason' value=''/><field name='narrative' id='narrative' value='2/26/2021 - TD: added Louise to record. Check only has her name and didn&apos;t return the reply device.\r\n3/17/2015 - MS: Removed an extra sopace between Spring and St in Address field. \r\n\r\n8/26/2014 - MS: Moved initial to Middle Name field.\r\n\r\n11/14/2012 TD: \x07 telephone number added per telephone campaign 2012'/><field name='tag_date' id='tag_date' value=''/><field name='quickbooks_customer_id' id='quickbooks_customer_id' value=''/></record></result>

Solution

  • Have you tried just replacing the problematic character?

    xmltodict.parse(response.text.replace('\x07', ''))