Search code examples
pythonpython-3.xmacos-high-sierrashelve

Why is a Python shelf file containing very little data so large on macOS?


I was just following along an example given in a book to illustrate the Python shelve module on macOS High Sierra.

As shown below only two small tuples of short strings get stored in a shelf. And as you can see in the very last line, the resulting file is 16 Megabyte large.

The resulting file only gets that large when I try the example on macOS High Sierra with the Python version installed through Homebrew (either 3.6.4 or 2.7.14). If I run it on a Linux host or with the pre-installed Python version (2.7.10) or with Python 3.6.4 installed through the official installer in macOS, the resulting addresses file is just a few Kilobyte large, just as reported by others in the comments (thanks!).

 ~/tmp> rm addresses
 ~/tmp> python3
Python 3.6.4 (default, Jan  6 2018, 18:43:09)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
[...]
>>> import shelve
>>> book = shelve.open("addresses")
>>> book['flintstone'] = ('fred', '555-1234', '1233 Bedrock Place')
>>> book['rubble'] = ('barney', '555-4321', '1235 Bedrock Place')
>>> book.close()
>>>
 ~/tmp> ll
total 32768
-rw-r--r--  1 moritz  staff    16M Jan 24 13:05 addresses

Solution

  • I could confirm this behavior is introduced by gdbm 1.14, gdbm is the library used by shelve to access database file.

    With change 2e8a5e0, gdbm will try to extend file size to match next_block_size. next_block_size is calculated by 4 * block_size, which is the optimal I/O block size of underlying filesystem, obtained by stat.st_blksize returned by stat(2). On my macOS 10.13.3, a file on APFS on SSD volume, stat.st_blksize is 4194304 bytes, next_block_size is 16777216 bytes, therefore the init db file size is 16MB.

    ps: I examined an HFS+ fs on an HDD volume at my hand, st_blksize value is 4096 bytes.