Search code examples
pythonintegerpython-internals

Does Python VM cache the integer objects which cannot be released automatically?


I have a single script as below:

a = 999999999999999999999999999999
b = 999999999999999999999999999999
print(a is b)

Output is:

[root@centos7-sim04 python]# python test2.py
True

On the other hand, the same code with command line:

>>> a = 999999999999999999999999999999
>>> b = 999999999999999999999999999999
>>> print(a is b)
False

The output is False.

  • What is the difference between the 2 ways?
  • When python script running, how does Python VM to manage the integer objects?
  • I see that the number from -5 to 256 is generated by VM automatically when VM started and VM will also assign the empty int blocks(chain struct) to avoid allocating memory frequently for large number storage.
  • Will these blocks be released automatically by python VM when memory is not enough? For my understanding, Python just keeps these blocks to avoid allocating memory frequently so that them will never be released automatically once allocated?

Just test with following code:

for i in range(1, 100000000):
    pass
print("OK")
gc.collect()
time.sleep(20)
print("slept")
for i in range(1, 100000000)
    pass

The memory is:

PID   USER      PR   NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
17351 root      20   0    3312060 3.039g 2096 S  11.3 82.4   0:03.53 python

Here is the result of vmstat:

[root@centos7-sim04 ~]# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
1  0      0 3376524    40 330084    0    0     2     5   25   41  0  0 100 0  0
1  0      0 185644     40 330084    0    0     0     0  714   28 14  3 82  1  0
0  0      0 967420     40 330084    0    0     0     0  292   15  7  0 93  0  0
0  0      0 967296     40 330084    0    0     0     0   20   23  0  0 100 0  0
0  0      0 967296     40 330084    0    0     0     0   15   17  0  0 100 0  0
0  0      0 967312     40 330068    0    0     0     1   27   39  0  0 100 0  0
1  0      0 185288     40 330068    0    0     0     2  701   55 17  0 83  0  0
0  0      0 3375780    40 330068    0    0     0     0  202   75  3  1 96  0  0
  • It seems that the memory is never released. Is this right?
  • If I want to release the integer objects when memory is not enough, how can I do?

Thanks a lot.

=============================== Update ===============================

range() in Python2 returns the full list which keeps all the items and xrange() in Python2 returns a generator. xrange() is the range() function in Python3.

Here is the link for generator in python and also the PEP link,

https://wiki.python.org/moin/Generators & https://www.python.org/dev/peps/pep-0255/


Solution

  • Regarding your first question, it basically related to the peephole optimizer which simplifies the expressions. i.e. using one integer object for all equal values. It also use this approach for interning the strings.

    The reason that you don't see such behavior within the interactive shell is that every command executes separately and gives the corresponding result, whereas in a file or in a function (even in terminal) all the commands interpreted at once. Here is an example:

    In [1]: def func():
       ...:     a = 9999999999999
       ...:     b = 9999999999999
       ...:     return a is b
       ...: 
    
    In [2]: func()
    Out[2]: True
    
    In [3]: a = 9999999999999
    
    In [4]: b = 9999999999999
    
    In [5]: a is b
    Out[5]: False
    

    Regarding your second question, there are actually plenty of misunderstandings here. First off, the duty of VM in python is executing the machine code corresponding to each bytecode, while managing the integers and actually parsing the code and compiling to the bytecode is the interpreter and compiler's task. And as it's mentioned in comments, range() in python 2 returns a list while in python 3 it's a smart object that preserves the start, end and step, and is a generator like object which generated the items on demand.

    Also about the functionality of gc.collect, as mentioned in documentation when you don't pass an argument to it, gc.collect run a full collection, and:

    The free lists maintained for a number of built-in types are cleared whenever a full collection or collection of the highest generation (2) is run. Not all items in some free lists may be freed due to the particular implementation, in particular float