Search code examples
pythonfilesizedisk

Does Python's os.path.getsize() have true byte resolution?


File systems rarely allow files to be arbitrary numbers of bytes long, instead preferring to pad them to fit in a certain number of blocks. Python's os.path.getsize() is documented to return a size in units of bytes, but I am not sure whether or not it is rounded by the OS (linux, in my case) or filesystem, to a block size. For my application it is necessary that I know the exact number of bytes that I will be able to read out of a large file (~1GB). What guarantees are made about this?


Solution

  • No guarantees are made by Python. The os.path.getsize() function returns the st_size field of a os.stat() call. This is a direct call to the stat system call.

    All the documentation for stat simply names st_size as the file size, in bytes.

    On my Debian test system stat gives true filesizes:

    $ stat -fc %s .   # fs block size
    4096
    $ head -c 2048 < /dev/urandom > 2kb
    $ head -c 6168 < /dev/urandom > 6kb
    $ head -c 12345 < /dev/urandom > 12andabitkb
    $ ls --block-size=1 -s *kb     # block use in bytes
    16384 12andabitkb   4096 2kb   8192 6kb
    $ ls --block-size=4K -s *kb    # block count per file
    4 12andabitkb  1 2kb  2 6kb
    $ python3 -c 'import os, glob; print(*("{:<11} {}".format(f, os.path.getsize(f)) for f in glob.glob("*kb")), sep="\n")'
    2kb         2048
    12andabitkb 12345
    6kb         6168