I am trying to build the MD5 hash based on a txt file. However, there are some rules that I need to follow, such as:
\r
and \n
characters, which means New line and Return Cartridge. I've tried remove this characters using rstrip
and strip
functions, but it looks that didn't work. To be sure about this, I wrote a txt file and opened it on Notepad++ and, as you can see on the picture below, the characters still there.Check the cr and lf characters in that image
I tried another solution: I used split
function to create a list, using \n
as a delimiter, just to be sure if that characters are really in there. As I thought, they were.
What should I do to really remove that characters?
One of the codes I tried:
from hashlib import md5
open_file = open('N0003977.290', 'r', encoding = 'ISO-8859-1')
test_file = open('file_test.txt', 'w')
file_content = open_file.read().lower().rstrip('\n\r ').strip('\n\r')
#writing a txt file to check if there are new line characters
test_file.write(file_content)
test_file.close()
#creating a md5 hash
m = md5()
m.update(file_content.encode('ISO-8859-1'))
print(m.hexdigest())
I would delete the "carriage return" and "line feed" characters using str.translate()
, like so:
file_content = file_content.translate({ord(ch):None for ch in '\r\n'})
Alternatively, if this were a classroom assignment and we had not yet covered str.translate()
, I might do the work "by hand":
file_content = ''.join(ch for ch in file_content if ch not in '\r\n')
Complete program:
from hashlib import md5
open_file = open('N0003977.290', 'r', encoding = 'ISO-8859-1')
test_file = open('file_test.txt', 'w', encoding = 'ISO-8859-1')
file_content = open_file.read()
# Choose one of the following:
file_content = file_content.translate({ord(ch):None for ch in '\r\n'})
# file_content = ''.join(ch for ch in file_content if ch not in '\r\n')
#writing a txt file to check if there are new line characters
test_file.write(file_content)
test_file.close()
#creating a md5 hash
m = md5()
m.update(file_content.encode('ISO-8859-1'))
print(m.hexdigest())