EDIT:
I have created a module to provide this functionality. It might not be that great but it can be obtained here.
I need to be able to parse format strings (as specified by the string format specification mini language). A project I'm working on makes heavy use of the parse
module for "unformatting" of strings. The module allows for creating customized format codes/formulas. My intent is to automatically parse certain kinds of format strings in a manner somewhat consistent with the existing string format specification mini language.
To clarify: by "format strings", I mean those strings that are used when using the format
function and format
method of str
objects, e.g.:
'{x!s: >5s}'.format('foo') # the format string is ' >5s'
I have taken a look at the cpython string module and line # 166 looks to me like it is saying that parsing of the format string is handled in the _string
module.
# The overall parser is implemented in _string.formatter_parser.
This occurs at this line (# 278):
return _string.formatter_parser(format_string)
I am pretty unfamiliar with the cPython code base and am not much of a C programmer, and I could not find the _string
module. I am wondering if it is implemented at the C language level...?
Main question: is the format specification parsing implementation exposed somewhere for use? How can I get to it so I don't have to write my own? I am looking to get output something like this:
>>> parse_spec(' >5.2f')
{'fill': ' ', 'align': '>', 'sign': None, '#': None, '0': None, 'width': 5, ',': None, 'precision': 2, 'type': 'f'}
Note that the comments say that, despite its name, _string.formatter_parser
does not do what I am after.
# returns an iterable that contains tuples of the form:
# (literal_text, field_name, format_spec, conversion)
# literal_text can be zero length
# field_name can be None, in which case there's no
# object to format and output
# if field_name is not None, it is looked up, formatted
# with format_spec and conversion and then used
def parse(self, format_string):
return _string.formatter_parser(format_string)
The format specification is specific to each object; it is parsed by the __format__()
method of an object. For example, for string objects, that method is implemented in C as the unicode__format__
function.
A lot of the format is shared between object types, and so is the code to handle it. The formatter_unicode.c
file handles most format-string parsing. Within this file, the parse_internal_render_format_spec()
function does most of the parsing.
Unfortunately, this function is not exposed to Python code. Moreover, it is declared as static
, so you can't access it externally (for instance, via a ctypes
wrapper), either. Your only options are to either re-implement it, or to re-compile your Python source code with the static
keyword removed from the function, and then access it via the shared library.