Search code examples
pythonstring-formattingcpython

Access the cpython string format specification mini language parser


EDIT:

I have created a module to provide this functionality. It might not be that great but it can be obtained here.

Original Question

I need to be able to parse format strings (as specified by the string format specification mini language). A project I'm working on makes heavy use of the parse module for "unformatting" of strings. The module allows for creating customized format codes/formulas. My intent is to automatically parse certain kinds of format strings in a manner somewhat consistent with the existing string format specification mini language.

To clarify: by "format strings", I mean those strings that are used when using the format function and format method of str objects, e.g.:

'{x!s: >5s}'.format('foo') # the format string is ' >5s'

I have taken a look at the cpython string module and line # 166 looks to me like it is saying that parsing of the format string is handled in the _string module.

# The overall parser is implemented in _string.formatter_parser.

This occurs at this line (# 278):

return _string.formatter_parser(format_string)

I am pretty unfamiliar with the cPython code base and am not much of a C programmer, and I could not find the _string module. I am wondering if it is implemented at the C language level...?

Main question: is the format specification parsing implementation exposed somewhere for use? How can I get to it so I don't have to write my own? I am looking to get output something like this:

>>> parse_spec(' >5.2f')
{'fill': ' ', 'align': '>', 'sign': None, '#': None, '0': None, 'width': 5, ',': None, 'precision': 2, 'type': 'f'}

EDIT

Note that the comments say that, despite its name, _string.formatter_parser does not do what I am after.

# returns an iterable that contains tuples of the form:
# (literal_text, field_name, format_spec, conversion)
# literal_text can be zero length
# field_name can be None, in which case there's no
#  object to format and output
# if field_name is not None, it is looked up, formatted
#  with format_spec and conversion and then used
def parse(self, format_string):
    return _string.formatter_parser(format_string)

Solution

  • The format specification is specific to each object; it is parsed by the __format__() method of an object. For example, for string objects, that method is implemented in C as the unicode__format__ function.

    A lot of the format is shared between object types, and so is the code to handle it. The formatter_unicode.c file handles most format-string parsing. Within this file, the parse_internal_render_format_spec() function does most of the parsing.

    Unfortunately, this function is not exposed to Python code. Moreover, it is declared as static, so you can't access it externally (for instance, via a ctypes wrapper), either. Your only options are to either re-implement it, or to re-compile your Python source code with the static keyword removed from the function, and then access it via the shared library.