Search code examples
pythonamazon-s3botolocalstackbotocore

Python : XML file downloaded from S3 full of string escaping characters


I have a number of XML files that I have added to S3 (localstack sever). I can view these files through Cyberduck and they are valid xml files. However, when I download the objects, the XML data is wrapped in double quotes, with each double quote in the document excaped, and each line having \n. I have made sure the response content type is "text/xml".

s3 = boto3.client('s3',
                  config=s3_config,
                  endpoint_url=endpoint_url,
                  aws_access_key_id='foo',
                  aws_secret_access_key='bar',
                 )

try:
    r = s3.get_object(Bucket=bucket, Key=key)
    return Response(r['Body'].read().decode("utf-8"))
except Exception as e:
    raise(e)

which results in a respose of

"
<rpc-reply xmlns:....">\n
    <data>\n
        <configuration>\n    
            <server>meanwhileinhell</server>\n
            <security>\n  
                <group>\n  
                    <name>mih-</name>\n
                    <system>\n            
                        <scripts>\n

             ...
             ...
             ...

        </configuration>\n
    </data>\n
</rpc-reply>\n"

I cannot seem to ensure this is a raw XML response body, with all of the escaping removed. Here are some of the other implementations I have tried:

from io import BytesIO

f = BytesIO()
s3.download_fileobj(bucket, key, f)
return Response(f.getvalue(), content_type="text/xml")
from xml.etree import ElementTree

tree = ElementTree.fromstring(r['Body'].read())
return Response(tree)

I have also tried using pickle and BeautifulSoup with no further success. I have not tried this with another type of file such as a jpg, but why can't I get the actual raw binary data from the objects? The files I am downloading are <50KB.


Solution

  • I got this working by using a StreamingHttpResponse, and decoding the stream. No escape characters or wrapped in double quotes.

    from django.http import StreamingHttpResponse
    
    r = s3.get_object(Bucket=bucket, Key=key)
    return StreamingHttpResponse(r['Body'].read().decode('utf-8'), content_type="text/xml")