Search code examples
pythonbuilt-in

Replacing Python's parser functionality?


First of all I want to mention that I know this is a horrible idea and it shouldn't be done. My intention is mainly curiosity and learning the innards of Python, and how to 'hack' them.

I was wondering whether it is at all possible to change what happens when we, for instance, use [] to create a list. Is there a way to modify how the parser behaves in order to, for instance, cause ["hello world"] to call print("hello world") instead of creating a list with one element?

I've attempted to find any documentation or posts about this but failed to do so.

Below is an example of replacing the built-in dict to instead use a custom class:

from __future__ import annotations
from typing import List, Any
import builtins


class Dict(dict):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__dict__ = self

    def subset(self, keys: List[Any]) -> Dict:
        return Dict({key: self[key] for key in keys})


builtins.dict = Dict

When this module is imported, it replaces the dict built-in with the Dict class. However this only works when we directly call dict(). If we attempt to use {} it will fall back to the base dict built-in implementation:

import new_dict

a = dict({'a': 5, 'b': 8})
b = {'a': 5, 'b': 8}

print(type(a))
print(type(b))

Yields:

<class 'py_extensions.new_dict.Dict'>
<class 'dict'>

Solution

  • [] and {} are compiled to specific opcodes that specifically return a list or a dict, respectively. On the other hand list() and dict() compile to bytecodes that search global variables for list and dict and then call them as functions:

    import dis
    
    dis.dis(lambda:[])
    dis.dis(lambda:{})
    dis.dis(lambda:list())
    dis.dis(lambda:dict())
    

    returns (with some additional newlines for clarity):

      3           0 BUILD_LIST               0
                  2 RETURN_VALUE
    
      5           0 BUILD_MAP                0
                  2 RETURN_VALUE
    
      7           0 LOAD_GLOBAL              0 (list)
                  2 CALL_FUNCTION            0
                  4 RETURN_VALUE
    
      9           0 LOAD_GLOBAL              0 (dict)
                  2 CALL_FUNCTION            0
                  4 RETURN_VALUE
    

    Thus you can overwrite what dict() returns simply by overwriting the global dict, but you can't overwrite what {} returns.

    These opcodes are documented here. If the BUILD_MAP opcode runs, you get a dict, no way around it. As an example, here is the implementation of BUILD_MAP in CPython, which calls the function _PyDict_FromItems. It doesn't look at any kind of user-defined classes, it specifically makes a C struct that represents a python dict.

    It is possible in at least some cases to manipulate the python bytecode at runtime. If you really wanted to make {} return a custom class, I suppose you could write some code to search for the BUILD_MAP opcode and replace it with the appropriate opcodes. Though those opcodes aren't the same size, so there's probably quite a few additional changes you'd have to make.