Fellows,
I am unable to parse a unicode text file submitted using django forms. Here are the quick steps I performed:
Uploaded a text file ( encoding: utf-16 ) ( File contents: Hello World 13
)
On server side, received the file using filename = request.FILES['file_field']
Going line by line: for line in filename: yield line
type(filename)
gives me <class 'django.core.files.uploadedfile.InMemoryUploadedFile'>
type(line)
is <type 'str'>
print line
: '\xff\xfeH\x00e\x00l\x00l\x00o\x00 \x00W\x00o\x00r\x00l\x00d\x00 \x001\x003\x00'
codecs.BOM_UTF16_LE == line[:2]
returns True
Now, I want to re-construct the unicode or ascii string back like "Hello World 13" so that I can parse the integer from line.
One of the ugliest way of doing this is to retrieve using line[-5:]
(= '\x001\x003\x00'
) and thus construct using line[-5:][1]
, line[-5:][3]
.
I am sure there must be better way of doing this. Please help.
Thanks in advance!
Use codecs.iterdecode()
to decode the object on the fly:
from codecs import iterdecode
for line in iterdecode(filename, 'utf16'): yield line