i write this function for check if month in persian exist in uicode string, replace it with number of month . i use this encode in header
`#!/usr/bin/python
# -*- coding: utf-8 -*-`
this is my def to convert month
def changeData(date):
if date:
date.encode('utf-8')
if "فروردین".encode('utf-8') in date:
return str.replace(":فروردین", ":1")
elif "اردیبهشت".encode('utf-8') in date:
return str.replace(":اردیبهشت", ":2")
elif "خرداد".encode('utf-8') in date:
return str.replace(":خرداد", ":3")
elif "تیر".encode('utf-8') in date:
return str.replace(":تیر", ":41")
elif "مرداد".encode('utf-8') in date:
return str.replace(":مرداد", ":5")
elif "شهریور".encode('utf-8') in date:
return str.replace(":شهریور", ":6")
elif "مهر".encode('utf-8') in date:
return str.replace(":مهر", ":7")
elif "آبان".encode('utf-8') in date:
return str.replace(":آبان", ":8")
elif "آذر".encode('utf-8') in date:
return str.replace(":آذر", ":9")
elif "دی".encode('utf-8') in date:
return str.replace(":دی", ":10")
elif "بهمن".encode('utf-8') in date:
return str.replace(":بهمن", ":11")
elif "اسفند".encode('utf-8') in date:
return str.replace(":اسفند", ":12")
i pass date with unicode format in function then convert it to encode('utf-8')
but give me this error
if "فروردین".encode('utf-8') in date:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)
how i can solve this problem
I assume Python 2.7.
So:
"فروردین".encode('utf-8') # UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)
The problem is the fact that in Python 2.7 strings are bytes:
print(repr("فروردین")) # '\xd9\x81\xd8\xb1\xd9\x88\xd8\xb1\xd8\xaf\xdb\x8c\xd9\x86'
With the following code:
"فروردین".encode('utf-8')
you're trying to encode bytes which is logically incorrect because:
ENCODING: unicode --> bytes
DECODING: bytes --> unicode
But Python doesn't throw smth like TypeError
, because Python is smart.
In such a case it tries first to decode the given bytes to unicode and then execute encoding specified by user.
The problem is that Python does the described decoding with a default encoding which is ASCII
in Python 2. Therefore the program terminates with the UnicodeDecodeError
.
The described decoding is similar to the:
unicode("فروردین") # UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)
So, you shouldn't encode byte-string and you have to DECODE it in order to receive unicode:
u = "فروردین".decode('utf-8')
print(type(u)) # <type 'unicode'>
Another way to get unicode is to use u
-literal + encoding declaration:
# coding: utf-8
u = u"فروردین"
print(type(u)) # <type 'unicode'>
print(u == "فروردین".decode('utf-8')) # True