Search code examples
pythonhashmd5strip

Python don't remove return cartridge and feed line in file


I am trying to build the MD5 hash based on a txt file. However, there are some rules that I need to follow, such as:

  • The encoding rule must be 'ISO-8859-1'
  • All the characters must be lowercase
  • New line characters and carriage return characters must NOT be considered on hash building

    My file contains \r and \n characters, which means New line and Return Cartridge. I've tried remove this characters using rstrip and strip functions, but it looks that didn't work. To be sure about this, I wrote a txt file and opened it on Notepad++ and, as you can see on the picture below, the characters still there.

Check the cr and lf characters in that image

I tried another solution: I used split function to create a list, using \n as a delimiter, just to be sure if that characters are really in there. As I thought, they were.

What should I do to really remove that characters?

One of the codes I tried:

from hashlib import md5

open_file = open('N0003977.290', 'r', encoding = 'ISO-8859-1')
test_file = open('file_test.txt', 'w')
file_content = open_file.read().lower().rstrip('\n\r ').strip('\n\r')

#writing a txt file to check if there are new line characters
test_file.write(file_content)
test_file.close()

#creating a md5 hash
m = md5()
m.update(file_content.encode('ISO-8859-1'))
print(m.hexdigest())

Solution

  • I would delete the "carriage return" and "line feed" characters using str.translate(), like so:

    file_content = file_content.translate({ord(ch):None for ch in '\r\n'})
    

    Alternatively, if this were a classroom assignment and we had not yet covered str.translate(), I might do the work "by hand":

    file_content = ''.join(ch for ch in file_content if ch not in '\r\n')
    

    Complete program:

    from hashlib import md5
    
    open_file = open('N0003977.290', 'r', encoding = 'ISO-8859-1')
    test_file = open('file_test.txt', 'w', encoding = 'ISO-8859-1')
    file_content = open_file.read()
    
    # Choose one of the following:
    file_content = file_content.translate({ord(ch):None for ch in '\r\n'})
    # file_content = ''.join(ch for ch in file_content if ch not in '\r\n')
    
    
    #writing a txt file to check if there are new line characters
    test_file.write(file_content)
    test_file.close()
    
    #creating a md5 hash
    m = md5()
    m.update(file_content.encode('ISO-8859-1'))
    print(m.hexdigest())