I really LOVE f-strings. They're bloody awesome syntax.
For a while now I've had an idea for a little library- described below*- to harness them further. A quick example of what I would like it do:
>>> import simpleformatter as sf
>>> def format_camel_case(string):
... """camel cases a sentence"""
... return ''.join(s.capitalize() for s in string.split())
...
>>> @sf.formattable(camcase=format_camel_case)
... class MyStr(str): ...
...
>>> f'{MyStr("lime cordial delicious"):camcase}'
'LimeCordialDelicious'
It would be immensely useful -- for the purposes of a simplified API, and extending usage to built-in class instances -- to find a way to hook into the builtin python formatting machinery, which would allow the custom format specification of built-ins:
>>> f'{"lime cordial delicious":camcase}'
'LimeCordialDelicious'
In other words, I'd like to override the built in format
function (which is used by the f-string syntax) -- or alternatively, extend the built-in __format__
methods of existing standard library classes -- such that I could write stuff like this:
for x, y, z in complicated_generator:
eat_string(f"x: {x:custom_spec1}, y: {x:custom_spec2}, z: {x:custom_spec3}")
I have accomplished this by creating subclasses with their own __format__
methods, but of course this will not work for built-in classes.
I could get close to it using the string.Formatter
api:
my_formatter=MyFormatter() # custom string.Formatter instance
format_str = "x: {x:custom_spec1}, y: {x:custom_spec2}, z: {x:custom_spec3}"
for x, y, z in complicated_generator:
eat_string(my_formatter.format(format_str, **locals()))
I find this to be a tad clunky, and definitely not readable compared to the f-string api.
Another thing that could be done is overriding builtins.format
:
>>> import builtins
>>> builtins.format = lambda *args, **kwargs: 'womp womp'
>>> format(1,"foo")
'womp womp'
...but this doesn't work for f-strings:
>>> f"{1:foo}"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Invalid format specifier
Currently my API looks something like this (somewhat simplified):
import simpleformatter as sf
@sf.formatter("this_specification")
def this_formatting_function(some_obj):
return "this formatted someobj!"
@sf.formatter("that_specification")
def that_formatting_function(some_obj):
return "that formatted someobj!"
@sf.formattable
class SomeClass: ...
After which you can write code like this:
some_obj = SomeClass()
f"{some_obj:this_specification}"
f"{some_obj:that_specification}"
I would like the api to be more like the below:
@sf.formatter("this_specification")
def this_formatting_function(some_obj):
return "this formatted someobj!"
@sf.formatter("that_specification")
def that_formatting_function(some_obj):
return "that formatted someobj!"
class SomeClass: ... # no class decorator needed
...and allow use of custom format specs on built-in classes:
x=1 # built-in type instance
f"{x:this_specification}"
f"{x:that_specification}"
But in order to do these things, we have to burrow our way into the built-in format()
function. How can I hook into that juicy f-string goodness?
* NOTE: I'll probably never actually get around to implementing this library! But I do think it's a neat idea and invite anyone who wants to, to steal it from me :).
You can, but only if you write evil code that probably should never end up in production software. So let's get started!
I'm not going to integrate it into your library, but I will show you how to hook into the behavior of f-strings. This is roughly how it'll work:
FORMAT_VALUE
instructions with calls to a hook function;You can get the full source at https://github.com/mivdnber/formathack, but everything is explained below.
This solution isn't great, because
However, it is a solution, and bytecode manipulation has been used succesfully in popular packages like PonyORM. Just keep in mind that it's hacky, complicated and probably maintenance heavy.
Python code is not executed directly, but is first compiled to a simpler intermediairy, non-human readable stack based language called Python bytecode (it's what's inside *.pyc files). To get an idea of what that bytecode looks like, you can use the standard library dis module to inspect the bytecode of a simple function:
def invalid_format(x):
return f"{x:foo}"
Calling this function will cause an exception, but we'll "fix" that soon.
>>> invalid_format("bar")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in invalid_format
ValueError: Invalid format specifier
To inspect the bytecode, fire up a Python console and call dis.dis
:
>>> import dis
>>> dis.dis(invalid_format)
2 0 LOAD_FAST 0 (x)
2 LOAD_CONST 1 ('foo')
4 FORMAT_VALUE 4 (with format)
6 RETURN_VALUE
I've annotated the output below to explain what's happening:
# line 2 # Put the value of function parameter x on the stack
2 0 LOAD_FAST 0 (x)
# Put the format spec on the stack as a string
2 LOAD_CONST 1 ('foo')
# Pop both values from the stack and perform the actual formatting
# This puts the formatted string on the stack
4 FORMAT_VALUE 4 (with format)
# pop the result from the stack and return it
6 RETURN_VALUE
The idea here is to replace the FORMAT_VALUE
instruction with a call to a hook function that allows us to implement whatever behavior we want. Let's implement it like this for now:
def formathack_hook__(value, format_spec=None):
"""
Gets called whenever a value is formatted. Right now it's a silly implementation,
but it can be expanded with all sorts of nasty hacks.
"""
return f"{value} formatted with {format_spec}"
To replace the instruction, I used the bytecode package, which provides surprisingly nice abstractions for doing horrible things.
from bytecode import Bytecode
def formathack_rewrite_bytecode__(code):
"""
Modifies a code object to override the behavior of the FORMAT_VALUE
instructions used by f-strings.
"""
decompiled = Bytecode.from_code(code)
modified_instructions = []
for instruction in decompiled:
name = getattr(instruction, 'name', None)
if name == 'FORMAT_VALUE':
# 0x04 means that a format spec is present
if instruction.arg & 0x04 == 0x04:
callback_arg_count = 2
else:
callback_arg_count = 1
modified_instructions.extend([
# Load in the callback
Instr("LOAD_GLOBAL", "formathack_hook__"),
# Shuffle around the top of the stack to put the arguments on top
# of the function global
Instr("ROT_THREE" if callback_arg_count == 2 else "ROT_TWO"),
# Call the callback function instead of executing FORMAT_VALUE
Instr("CALL_FUNCTION", callback_arg_count)
])
# Kind of nasty: we want to recursively alter the code of functions.
elif name == 'LOAD_CONST' and isinstance(instruction.arg, types.CodeType):
modified_instructions.extend([
Instr("LOAD_CONST", formathack_rewrite_bytecode__(instruction.arg), lineno=instruction.lineno)
])
else:
modified_instructions.append(instruction)
modified_bytecode = Bytecode(modified_instructions)
# For functions, copy over argument definitions
modified_bytecode.argnames = decompiled.argnames
modified_bytecode.argcount = decompiled.argcount
modified_bytecode.name = decompiled.name
return modified_bytecode.to_code()
We can now make the invalid_format
function we defined earlier work:
>>> invalid_format.__code__ = formathack_rewrite_bytecode__(invalid_format.__code__)
>>> invalid_format("bar")
'bar formatted with foo'
Success! Manually cursing code objects with tainted bytecode in itself won't damn our souls to an eternity of suffering though; for that, we should manipulate all code automatically.
To make the new f-string behavior work everywhere, and not just in manually patched functions, we can customize the Python module import process with a custom module finder and loader using the functionality provided by the standard library importlib module:
class _FormatHackLoader(importlib.machinery.SourceFileLoader):
"""
A module loader that modifies the code of the modules it imports to override
the behavior of f-strings. Nasty stuff.
"""
@classmethod
def find_spec(cls, name, path, target=None):
# Start out with a spec from a default finder
spec = importlib.machinery.PathFinder.find_spec(
fullname=name,
# Only apply to modules and packages in the current directory
# This prevents standard library modules or site-packages
# from being patched.
path=[""],
target=target
)
if spec is None:
return None
# Modify the loader in the spec to this loader
spec.loader = cls(name, spec.origin)
return spec
def get_code(self, fullname):
# This is called by exec_module to get the code of the module
# to execute it.
code = super().get_code(fullname)
# Rewrite the code to modify the f-string formatting opcodes
rewritten_code = formathack_rewrite_bytecode__(code)
return rewritten_code
def exec_module(self, module):
# We introduce the callback that hooks into the f-string formatting
# process in every imported module
module.__dict__["formathack_hook__"] = formathack_hook__
return super().exec_module(module)
To make sure the Python interpreter uses this loader to import all files, we have to add it to sys.meta_path
:
def install():
# If the _FormatHackLoader is not registered as a finder,
# do it now!
if sys.meta_path[0] is not _FormatHackLoader:
sys.meta_path.insert(0, _FormatHackLoader)
# Tricky part: we want to be able to use our custom f-string behavior
# in the main module where install was called. That module was loaded
# with a standard loader though, so that's impossible without additional
# dirty hacks.
# Here, we execute the module _again_, this time with _FormatHackLoader
module_globals = inspect.currentframe().f_back.f_globals
module_name = module_globals["__name__"]
module_file = module_globals["__file__"]
loader = _FormatHackLoader(module_name, module_file)
loader.load_module(module_name)
# This is actually pretty important. If we don't exit here, the main module
# will continue from the formathack.install method, causing it to run twice!
sys.exit(0)
If we put it all together in a formathack
module (see https://github.com/mivdnber/formathack for an integrated, working example), we can now use it like this:
# In your main Python module, install formathack ASAP
import formathack
formathack.install()
# From now on, f-string behavior will be overridden!
print(f"{foo:bar}")
# -> "foo formatted with bar"
So that's that! You can expand on this to make the hook function more intelligent and useful (e.g. by registering functions that handle certain format specifiers).