Search code examples
pythonobjectgarbage-collectioncpythonpython-internals

When are python objects created?


Python's id() function returns the unique identifier for an object. So when in my terminal I do something like:

>> a = 23
>> id(a)
28487496

Now, I know that python keeps track of all the objects created and number of references to that object and when the value reaches 0, the object is garbage collected.

What I want to know is that what happens when I do something like this:

>> id(27)
28487498

I never created an object with value 27 i.e i never wrote b=27 still somehow I get a unique identifier for this. Does this mean that an object was created in memory? If yes, even then there should be 0 references to this object and it should have been garbage collected.

So, when is an Object actually created in Memory ?

Please let me know if i am wrong somewhere.

Another interesting thing that I just found out is:

>> a = 23
>> id(a)
28487496
>> id(20 + 3)

28487496

In this case Python remembers the reference to number 23 itself, how does Python do this?


Solution

  • Objects are created as needed, in different places.

    To start, when you write

    b = 27
    

    two things happen. The 27 expression is evaluated, resulting in an integer object being pushed onto the stack, and then, as a separate step, the object is assigned to b. Assignment doesn't create objects.

    If you did just this:

    27
    

    The 27 expression is still evaluated. The object would be created*, then destroyed again as the reference count drops back to 0 again.

    That's needed because you could pass that object to another function:

    id(27)
    

    needs something to be passed to the id() function. So 27 is added to the stack so you can call the function.

    I'll use a mutable object instead of an integer, to illustrate that a new object is created; so instead of id(27) I'll use id([]) and ask the dis module to show me the bytecode that Python would execute:

    >>> import dis
    >>> dis.dis(compile('id([])', '', 'exec'))
      1           0 LOAD_NAME                0 (id)
                  2 BUILD_LIST               0
                  4 CALL_FUNCTION            1
                  6 POP_TOP
                  8 LOAD_CONST               0 (None)
                 10 RETURN_VALUE
    

    The BUILD_LIST 0 opcode is used to create the empty list object and push it onto the stack, and CALL_FUNCTION 1 then calls id to passing in one value from the stack, which is that list.

    I didn't use id(27) because immutable objects like integers and tuples and such are actually cached with the bytecode that is compiled; these are created when Python compiles the code (or when you load the .pyc bytecode cache from disk):

    >>> dis.dis(compile('id(27)', '', 'exec'))
      1           0 LOAD_NAME                0 (id)
                  2 LOAD_CONST               0 (27)
                  4 CALL_FUNCTION            1
                  6 POP_TOP
                  8 LOAD_CONST               1 (None)
                 10 RETURN_VALUE
    

    Note the LOAD_CONST, it loads the data from the co_consts structure:

    >>> compile('id(27)', '', 'exec').co_consts
    (27, None)
    

    So objects can be created when compiling, or when execuning special opcodes for specific Python syntax.

    There are more places:

    • There are more opcodes, for creating lists, tuples, dictionaries, sets and strings, for example.
    • When you create an instance of a class, type.__new__ will create an instance object on the heap. So CustomClass(arg1, arg2) creates an object with the right type.
    • The same applies to all built-in types; int(somevalue) creates an integer object on the heap.
    • Plenty of built-in functions will create new objects as needed, returning those from calls
    • class, def statements and the lambda expression create objects (class objects, functions, and more functions, these are all objects too).

    * Small integers are actually interned; for performance reasons, CPython keeps a single copy each of the integers between -5 and and 256, so these objects are actually created only once, and referenced everywhere you need one. See "is" operator behaves unexpectedly with integers. For the purposes of this answer I'm ignoring this.

    And because they are interned, the result of 20 + 3 returns that single copy and the id() will still be the same as if you asked for id(23) directly.

    There are more implementation details; there are many more. Some string objects are interned (see my answer here). Code evaluated in the interactive interpreter is compiled one top-level block at a time, but in a script compilation is done per scope instead. Because constants are attached to compiled code objects, that means that there are differences as to when constants are shared. Etc. etc.

    The only objects you can rely on not being recreated all the time are explicitly documented in the datamodel documentation as being singletons; None being the most prominent of these.