First of all I want to mention that I know this is a horrible idea and it shouldn't be done. My intention is mainly curiosity and learning the innards of Python, and how to 'hack' them.
I was wondering whether it is at all possible to change what happens when we, for instance, use []
to create a list. Is there a way to modify how the parser behaves in order to, for instance, cause ["hello world"]
to call print("hello world")
instead of creating a list with one element?
I've attempted to find any documentation or posts about this but failed to do so.
Below is an example of replacing the built-in dict to instead use a custom class:
from __future__ import annotations
from typing import List, Any
import builtins
class Dict(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.__dict__ = self
def subset(self, keys: List[Any]) -> Dict:
return Dict({key: self[key] for key in keys})
builtins.dict = Dict
When this module is imported, it replaces the dict
built-in with the Dict
class. However this only works when we directly call dict()
. If we attempt to use {}
it will fall back to the base dict
built-in implementation:
import new_dict
a = dict({'a': 5, 'b': 8})
b = {'a': 5, 'b': 8}
print(type(a))
print(type(b))
Yields:
<class 'py_extensions.new_dict.Dict'>
<class 'dict'>
[]
and {}
are compiled to specific opcodes that specifically return a list
or a dict
, respectively. On the other hand list()
and dict()
compile to bytecodes that search global variables for list
and dict
and then call them as functions:
import dis
dis.dis(lambda:[])
dis.dis(lambda:{})
dis.dis(lambda:list())
dis.dis(lambda:dict())
returns (with some additional newlines for clarity):
3 0 BUILD_LIST 0
2 RETURN_VALUE
5 0 BUILD_MAP 0
2 RETURN_VALUE
7 0 LOAD_GLOBAL 0 (list)
2 CALL_FUNCTION 0
4 RETURN_VALUE
9 0 LOAD_GLOBAL 0 (dict)
2 CALL_FUNCTION 0
4 RETURN_VALUE
Thus you can overwrite what dict()
returns simply by overwriting the global dict
, but you can't overwrite what {}
returns.
These opcodes are documented here. If the BUILD_MAP opcode runs, you get a dict
, no way around it. As an example, here is the implementation of BUILD_MAP in CPython, which calls the function _PyDict_FromItems. It doesn't look at any kind of user-defined classes, it specifically makes a C struct that represents a python dict
.
It is possible in at least some cases to manipulate the python bytecode at runtime. If you really wanted to make {}
return a custom class, I suppose you could write some code to search for the BUILD_MAP
opcode and replace it with the appropriate opcodes. Though those opcodes aren't the same size, so there's probably quite a few additional changes you'd have to make.