Search code examples
pythonjsonxmlamazon-web-servicesamazon-sqs

How to read all lines from AWS SQS API bytes xml string response via defusedxml, instead of just the first one?


Code:

from defusedxml import ElementTree as etree
s = b'<?xml version="1.0"?><GetQueueAttributesResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><GetQueueAttributesResult><Attribute><Name>ApproximateNumberOfMessages</Name><Value>2</Value></Attribute></GetQueueAttributesResult><ResponseMetadata><RequestId>xxxx</RequestId></ResponseMetadata></GetQueueAttributesResponse>'
print(etree.fromstring(s))

Expected Output:
Should show complete xml data (same as input), so that it can be parsed further.

Actual Output:
Shows only first line.

<Element '{http://queue.amazonaws.com/doc/2012-11-05/}GetQueueAttributesResponse' at 0x09B50720>

This is all the data it reads.
Because I tried functions like findall() and getchildren() on this output and it returns nothing further.

How to resolve this issue? OR If there is some alternative library for similar approach, please suggest.

Alternatively, if there is any library to directly convert such xml data to json/dict, that will be super helpful.
But, it should convert data to readable form, not something like xmltodict where it gives weird OrderedDicts.

Note: Whichever library is suggested needs to be secure also, not like xml which has vulnerabilities.


Solution

  • Was able to form concise logic from above sample and references.

    from defusedxml import ElementTree as ETree
    
    def parse_xml(xml, tag):
        xml_tree = ETree.fromstring(xml)
        xml_tree_str = str(xml_tree)
        xpath = xml_tree_str[xml_tree_str.find("{"): xml_tree_str.find("}") + 1]
        return [
            {attr.tag[attr.tag.find("}") + 1 :]: attr.text for attr in element}
            for element in xml_tree.findall(f".//{xpath}{tag}")
        ]
    
    from unittest import TestCase
    class TestParseXML(TestCase):
        def test_parse_xml(self):
            xml = b"""<?xml version="1.0"?>
                                <XResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/">
                                    <XResult>
                                        <XResultEntry>
                                            <Id>1</Id>
                                            <Name>one</Name>
                                        </XResultEntry>
                                        <XResultEntry>
                                            <Id>2</Id>
                                            <Name>two</Name>
                                        </XResultEntry>
                                    </XResult>
                                    <ResponseMetadata>
                                        <RequestId>testreqid</RequestId>
                                    </ResponseMetadata>
                                </XResponse>"""
            data = parse_xml(xml, "XResultEntry")
            self.assertEqual(data, [{"Id": "1", "Name": "one"}, {"Id": "2", "Name": "two"}])