I am using below snippet to read the file and use the data to download from a cloud. This code perfectly fine in windows and Mac too. But script fails while running the script in ssh by logging in an linux instance.
Later I used below command to run the file and that seems to be like working till some point and later it failed.
Command: PYTHONIOENCODING=utf-8 python3 filename.py textfilename
def file_block(fp, number_of_blocks, block):
'''
A generator that splits a file into blocks and iterates
over the lines of one of the blocks.
'''
assert 0 <= block and block < number_of_blocks
assert 0 < number_of_blocks
fp.seek(0,2)
file_size = fp.tell()
ini = file_size * block / number_of_blocks
end = file_size * (1 + block) / number_of_blocks
if ini <= 0:
fp.seek(0)
else:
fp.seek(ini-1)
fp.readline()
while fp.tell() < end:
yield fp.readline()
def download_files(conn,container_name,number_of_chunks, chunk_number, file_name):
fp = open(file_name, encoding='utf-8')
counter = 0
try:
for line in file_block(fp,number_of_chunks, chunk_number):
counter = counter + 1
clean_object_name = str(bytes(line, encoding='utf-8').decode('utf-8', 'ignore')).rstrip('\n\r ')
try:
if not os.path.exists(os.path.dirname(clean_object_name)):
os.makedirs(os.path.dirname(clean_object_name))
if os.path.basename(clean_object_name) != '':
obj_tuple = conn.get_object(container_name, clean_object_name)
with open(clean_object_name, 'wb') as f:
f.write(obj_tuple[1])
print("Successfull ", current_process().name, " ", counter , " " ,clean_object_name.encode('utf-8'), "\n")
except:
sys.exc_info()[2]
if not os.path.exists("log"):
os.mkdir("log")
with open("log/" + "log_" + current_process().name + ".txt", 'a', encoding='utf-8') as f:
try:
print("Failed counter ", counter, " " ,clean_object_name.encode('utf-8'))
f.write("missing " + clean_object_name.encode('utf-8') + "\n")
f.write("traceback " + sys.exc_info()[2] + "\n")
except:
f.write("missing " + str(counter) + "\n")
except:
sys.exc_info()[2]
if not os.path.exists("process_failure_log"):
os.mkdir("process_failure_log")
with open("process_failure_log/" + "log_" + current_process().name + ".txt", 'a', encoding='utf-8') as f:
try:
f.write("process failed while reading the file at counter " + str(counter) + "\n")
f.write(str(sys.exc_info()[2]) + "\n")
except:
f.write("missing " + str(counter) + "\n")
Text file contained below data:
user_photos/images/282/onehundred/Capture d’écran 2012-09-07 à 2.50.31 PM20120917-37935-13g7sn1-0_1347875141.png
user_photos/images/282/original/Capture d’écran 2012-09-07 à 2.50.31 PM20120917-37935-13g7sn1-0_1347875141.png
user_photos/images/282/preview/Capture d’écran 2012-09-07 à 2.50.31 PM20120917-37935-13g7sn1-0_1347875141.png
user_photos/images/282/thumbnail/Capture d’écran 2012-09-07 à 2.50.31 PM20120917-37935-13g7sn1-0_1347875141.png
user_photos/images/282/twohundred/Capture d’écran 2012-09-07 à 2.50.31 PM20120917-37935-13g7sn1-0_1347875141.png
user_photos/images/283/onehundred/Capture d’écran 2012-09-11 à 6.21.50 PM20120917-38000-37awsu-0_1347875181.jpg
user_photos/images/283/original/Capture d’écran 2012-09-11 à 6.21.50 PM20120917-38000-37awsu-0_1347875181.jpg
user_photos/images/283/preview/Capture d’écran 2012-09-11 à 6.21.50 PM20120917-38000-37awsu-0_1347875181.jpg
user_photos/images/283/thumbnail/Capture d’écran 2012-09-11 à 6.21.50 PM20120917-38000-37awsu-0_1347875181.jpg
user_photos/images/283/twohundred/Capture d’écran 2012-09-11 à 6.21.50 PM20120917-38000-37awsu-0_1347875181.jpg
user_photos/images/284/onehundred/Capture d’écran 2012-09-11 à 6.20.56 PM20120917-38101-6po8vq-0_1347875238.jpg
user_photos/images/284/original/Capture d’écran 2012-09-11 à 6.20.56 PM20120917-38101-6po8vq-0_1347875238.jpg
user_photos/images/284/preview/Capture d’écran 2012-09-11 à 6.20.56 PM20120917-38101-6po8vq-0_1347875238.jpg
user_photos/images/284/thumbnail/Capture d’écran 2012-09-11 à 6.20.56 PM20120917-38101-6po8vq-0_1347875238.jpg
user_photos/images/284/twohundred/Capture d’écran 2012-09-11 à 6.20.56 PM20120917-38101-6po8vq-0_1347875238.jpg
After using the above command I was able to read and write the file but script failed in the below point:
with open(clean_object_name, 'wb') as f:
f.write(obj_tuple[1])
Traceback:
'ascii' codec can't encode character in position 55-56: ordinal not in
range(128).
I know it is because of the eccentric character. I can use decode method. But I don't want to replace the file name with unrecognized character.
I got confused , if encoding was a problem script then should fail in the initial stage. but script works fine when reading and writing to other file. It is failed only when creating the file with the eccentric character. Please suggest, I have wasted my two full day of work just hanging with this. Code works perfectly in windows and mac.
This was an environment variable issue in linux.
I have verified by running the below python method
import sys
print(sys.getfilesystemencoding())
It has returned the value : ascii
I have made following changes in the linux terminal.
$ sudo vim /etc/environment
and set the LC_ALL to :
LANG="en_US.UTF-8"
LC_MESSAGES="C"
LC_ALL="en_US.UTF-8"
then reboot,and run locale
After this change, the above method returned value 'utf-8'
and its worked nicely.