I have a number of XML files that I have added to S3 (localstack sever). I can view these files through Cyberduck and they are valid xml files. However, when I download the objects, the XML data is wrapped in double quotes, with each double quote in the document excaped, and each line having \n. I have made sure the response content type is "text/xml".
s3 = boto3.client('s3',
config=s3_config,
endpoint_url=endpoint_url,
aws_access_key_id='foo',
aws_secret_access_key='bar',
)
try:
r = s3.get_object(Bucket=bucket, Key=key)
return Response(r['Body'].read().decode("utf-8"))
except Exception as e:
raise(e)
which results in a respose of
"
<rpc-reply xmlns:....">\n
<data>\n
<configuration>\n
<server>meanwhileinhell</server>\n
<security>\n
<group>\n
<name>mih-</name>\n
<system>\n
<scripts>\n
...
...
...
</configuration>\n
</data>\n
</rpc-reply>\n"
I cannot seem to ensure this is a raw XML response body, with all of the escaping removed. Here are some of the other implementations I have tried:
from io import BytesIO
f = BytesIO()
s3.download_fileobj(bucket, key, f)
return Response(f.getvalue(), content_type="text/xml")
from xml.etree import ElementTree
tree = ElementTree.fromstring(r['Body'].read())
return Response(tree)
I have also tried using pickle
and BeautifulSoup
with no further success. I have not tried this with another type of file such as a jpg, but why can't I get the actual raw binary data from the objects? The files I am downloading are <50KB.
I got this working by using a StreamingHttpResponse, and decoding the stream. No escape characters or wrapped in double quotes.
from django.http import StreamingHttpResponse
r = s3.get_object(Bucket=bucket, Key=key)
return StreamingHttpResponse(r['Body'].read().decode('utf-8'), content_type="text/xml")