Search code examples
python-3.xstringlisttuples

Why python allocate memory to string is different strategy?


String type uses Unicode representation. Unicode string can take up to 4 bytes per character depending on the encoding.

I have know python use three kinds of internal representations for Unicode string:

  • 1 bytes
  • 2 bytes
  • 4 bytes

But still I am confused about memory allocation to the string.

import sys

>>> print(sys.getsizeof("hello world hello world"))
>>> 72

>>> print(sys.getsizeof(["hello world hello world"]))
>>> 64

>>> print(sys.getsizeof(("hello world hello world",)))
>>> 48

Why does this happen? When I put the same string into list and tuple the size were decreased. But why?


Solution

  • getsizeofcall is not recursive, and do not give you the size of all contained objects when you call it on a container. The only call to get the real str object size in your example is print(sys.getsizeof("hello world hello world")), the others just present you the size of a 1-item list, and the size of a 1 item tuple.

    In order to get the full size of a composed object, you have to use a receipt for a recursive function that will yield you the size of an object, plus the size of all its attributes and contained objects if any.

    Something along:

    
    from sys import getsizeof
    
    def getfullsize(obj, seen=None):
        if seen is None:
            seen = set()
        if id(obj) in seen:
            return 0
        seen.add(id(obj))
        size = getsizeof(obj)
        if not isinstance (obj, (str, bytes)) and hasattr(type(obj), "__len__"):
            for item in obj:
                if hasattr(type(obj), "values"):
                    size += getfullsize(obj[item], seen)
               
                size += getfullsize(item, seen)
        if hasattr(obj, "__dict__"):
            size += getfullsize(obj.__dict__, seen)
        if hasattr(obj, "__slots__"):
            for attr in obj.__slots__:
                if (item:=getattr(obj, attr, None)) is not None:
                    size+= getfullsize(item, seen)
        return size