Search code examples
pythonfilesizeread-write

File size changes after read/write txt file in Python


After executing the following code to generate a copy of a text file with Python, the newfile.txt doesn't have the exact same file size as oldfile.txt.

with open('oldfile.txt','r') as a, open('newfile.txt','w') as b:
    content = a.read()
    b.write(content)

While oldfile.txt has e.g. 667 KB, newfile.txt has 681 KB.

Is there an explanation for that?


Solution

  • There are various causes.

    You are opening a file as text file, so the bytes of file are interpreted (decoded) into Python, and then encoded. So there could be changes.

    From open's documentation:

    When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.

    So if the original file were ASCII (e.g., generated in Windows), you will have the \r removed. But when writing back the file you can have no more the original \r (if you are in Linux or macOS) or you will have always \r\n, if you are on Windows (which it seems the case, because your file increase in size).

    Also encoding could change text. E.g., the BOM mark could be removed (or added), and potentially (but as far as I know, it is not done implicitly), unneeded codes could be removed (you can have some extra code in Unicode, which change the behaviour of nearby codes. One could add more of one of them, but only the last one is effective.