Search code examples
pythonfilememorypickledisk

Python - size of object in memory vs. on disk


Here is my example:

import numpy as np
test = [np.random.choice(range(1, 1000), 1000000) for el in range(1,1000)]

this object takes in memory:

print(sys.getsizeof(test)/1024/1024/1024)
8.404254913330078e-06

something like 8 KB

When I write it to disk

import pickle
file_path = './test.pickle'
with open(file_path, 'wb') as f:
    pickle.dump(test, f)

it take almost 8GB from ls -l command

Could somebody clarify why it take so little space in memory and some much on disk? I am guessing in memory numbers are not accurate.


Solution

  • I am guessing in memory numbers are not accurate.

    Well, this would not explain 6 orders of magnitude in size, right? ;)

    test is a Python list instance. getsizeof will tell you the size of "a pointer", which is 64bit on your system together with some other attributes. But you will need to do a bit more to get all the stuff which is attached to this instance, inspecting each element (lists have no strict types in Python, so you can't simply do size_of_element * len(list) etc.).

    Here is one resource: https://code.tutsplus.com/tutorials/understand-how-much-memory-your-python-objects-use--cms-25609

    Here is another one: How do I determine the size of an object in Python?