Search code examples
pythonreferencecpython

Relationship between references, variables and identities


I couldn't find definitions for the terms "reference", "variable" and "identity" in the Python glossary or in my textbook. I would like to clarify my understanding of these terms. Let's say we have an object obj. Is it true that the following phrases are synonyms in CPython:

  1. "reference to obj", "identity of obj", "memory address of obj" ?
  2. "variable", "named reference", "identifier that is bound to some reference" ?
  3. "variable a that refers to obj", "a pair consisting of identifier a, identity of obj" ?

P.S. Mark Lutz in his book "Learning Python" (5th ed.) says the following on p.177:

Readers with a background in C may find Python references similar to C pointers (memory addresses). In fact, references are implemented as pointers (in CPython), and they often serve the same roles, especially with objects that can be changed in place (more on this later). However, because references are always automatically dereferenced when used, you can never actually do anything useful with a reference itself; this is a feature that eliminates a vast category of C bugs. But you can think of Python references as C “void*” pointers, which are automatically followed whenever used.

Using this information and the answers below, I came up with my current understanding, which is as follows:

I. All expressions in Python (or, at least, most of them) return reference to an object.
II. In general, Python doesn't specify the internal structure of a reference. But in the CPython implementation references are implemented as C objects (PyObject* pointers), that are very similar to void* pointers, which are automatically followed whenever used. Such a reference is not a Python object, but it contains the actual memory address of a "PyObject C struct" that corresponds to some Python object. I think that this memory address is equal to the identity of that Python object (this is how reference and identity are connected in CPython).
III. A Python variable is an identifier that is bound to a reference, i.e. the following phrases are equivalent (obj is a Python object): "variable a that refers to obj" and "a pair consisting of identifier a, reference to obj". Variable is not a Python object because reference is not a Python object. Note that not all references are variables in Python.
IV. At the beginning of this post point 1 is partially false because "reference to obj" is not the same as "identity of obj" (in CPython a reference is a C pointer whereas identity is just an integer number). Point 2 is true. Point 3 is false (see the correct variant in p.III above).


Solution

  • A variable is a name (i.e., an identifier) that refers to a value.

    A reference is any thing we can use to gain access to a value. References are generally produced by expressions, which can include, for example,

    • Literals (5, 'x', etc)
    • Variables (x)
    • List, dict, and set displays (including list and dict comprehensions)
    • Indexing operations (x[5])
    • Operator expressions (x and y, x + y)
    • Function calls (f(x))

    "Identity" gets used in two related ways.

    • The identity of an object is an implementation-dependent integer associated with that object while it exists. You can use the id function to retrieve the identity given any reference to the object. (id(3), id(x), id(x[4), id(y + 3*z), etc)

    • We say two references are identical if they both refer to the same object, not just two objects that are equal. We can say with certainty that x is y is true after

      x = 3
      y = x
      

      but not

      x = 3242
      y = 3242
      

      because there is no guarantee that the same literal is always a reference to the same object. (Though in CPython, for example, small integers are interned. x = 3; y = 3; x is y will generally be true, but there is no guarantee that a more complex expression like x + 3* y is 5 will be true, even if x + 3 * y == 5.