Search code examples
pythonctypes

Python ctypes's sprintf formats any float type as b'0.000000' or b'5.25662e-315'


I'm experimenting with the fastest way to format a float as a string with as minimal representation as possible (no trailing 0's, no decimal places if it can be helped, no scientific notation). I've decided to try Python's ctypes module.

Based on several examples I thought this function would work, but instead it always prints b'0.000000' if using %f or b'5.25124e-315' if using %g Code:

from ctypes import *
import msvcrt
def floatToStr3(n:float)->str:
    libc = cdll.msvcrt
    print("n in:", n)
    sb = create_string_buffer(100)
    libc.sprintf(sb, b"%g", c_float(n))
    print("sb out:", sb.value)
    return sb.value

import random
floatToStr3(random.random())
floatToStr3(random.random())
floatToStr3(random.random())
floatToStr3(random.random())
floatToStr3(random.random())
floatToStr3(random.random())

Output:

n in: 0.9164215022054657
sb out: b'5.25662e-315'
n in: 0.6366531536720886
sb out: b'5.23343e-315'
n in: 0.07371310207853521
sb out: b'5.1052e-315'
n in: 0.6353450576077702
sb out: b'5.23332e-315'
n in: 0.2839487624658935
sb out: b'5.18628e-315'
n in: 0.5540225836869241
sb out: b'5.22658e-315'

I have a strong feeling I'm just not using create_string_buffer correctly, but I don't know what the answer is. Formatting using ints works.

Using Python 3.7.4 on Windows 10.


Solution

  • Observations:

    • Listing [Python.Docs]: ctypes - A foreign function library for Python

    • Check [SO]: C function called from Python via ctypes returns incorrect value (@CristiFati's answer) when working with CTypes functions

    • [Python.Docs]: Built-in Types - Numeric Types - int, float, complex states (emphasis is mine):

      Floating point numbers are usually implemented using double in C

      By casting the number to ctypes.c_float, it loses precision (as typically float is 4 bytes long, while double is 8), yielding values very close to 0, and hence the output (also intuited by @frost-nzcr4)

    • Calling sprintf directly, is definitely faster than calling any other Python conversion function. But let's not forget that Python has many optimizations, so even if the function call by itself is faster, the overhead needed for that call to be possible (Python <=>C conversions), could be higher and in some cases the overall performance worse than using a Python solution

    • If we talk about speed, placing sb = create_string_buffer(100) (and others) inside the function is not very smart. Do it outside (once, at the beginning) and only make use of it in the function

    Below it's an example.

    code00.py:

    #!/usr/bin/env python
    
    import ctypes as cts
    import random
    import sys
    import timeit
    
    
    c_float = cts.c_float
    c_double = cts.c_double
    cdll = cts.cdll
    create_string_buffer = cts.create_string_buffer
    
    swprintf = cts.windll.msvcrt.swprintf
    swprintf.argtypes = (cts.c_wchar_p, cts.c_wchar_p, cts.c_double)  # !!! swprintf (and all the family functions) have varargs !!!
    swprintf.restype = cts.c_int
    
    buf = cts.create_unicode_buffer(100)
    
    
    def original(f: float) -> str:
        libc_ = cdll.msvcrt
        #print("n in:", f)
        sb = create_string_buffer(100)
        libc_.sprintf(sb, b"%g", c_double(f))
        #print("sb out:", sb.value)
        return sb.value.decode()
    
    
    def improved(f: float) -> str:
        swprintf(buf, "%g", f)
        return buf.value
    
    
    def percent(f: float) -> str:
        return "%g" % f
    
    
    def format_(f: float) -> str:
        return "{0:g}".format(f)
    
    
    def f_string_default(f: float) -> str:
        return f"{f}"
    
    def f_string_g(f: float) -> str:
        return f"{f:g}"
    
    
    number_count = 3
    numbers = [random.random() for _ in range(number_count)]
    number = numbers[0]
    
    
    def main(*argv):
        funcs = (
            original,
            improved,
            percent,
            format_,
            f_string_default,
            f_string_g,
        )
    
        print("Functional tests")
        for f in numbers:
            print("\nNumber (default format): {0:}".format(f))
            for func in funcs:
                print("    {0:s}: {1:}".format(func.__name__, func(f)))
    
        print("\nPerformance tests (time took by each function)")
        for func in funcs:
            t = timeit.timeit(stmt="func(number)", setup="from __main__ import number, {0:s} as func".format(func.__name__))
            print("    {0:s}: {1:}".format(func.__name__, t))
    
    
    if __name__ == "__main__":
        print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                       64 if sys.maxsize > 0x100000000 else 32, sys.platform))
        rc = main(*sys.argv[1:])
        print("\nDone.")
        sys.exit(rc)
    
    

    Output:

    [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q061231308]> "e:\Work\Dev\VEnvs\py_pc064_03.07_test0\Scripts\python.exe" code00.py
    Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] 064bit on win32
    
    Functional tests
    
    Number (default format): 0.09201480511926563
        original: 0.0920148
        improved: 0.0920148
        percent: 0.0920148
        format_: 0.0920148
        f_string_default: 0.09201480511926563
        f_string_g: 0.0920148
    
    Number (default format): 0.3778731171686579
        original: 0.377873
        improved: 0.377873
        percent: 0.377873
        format_: 0.377873
        f_string_default: 0.3778731171686579
        f_string_g: 0.377873
    
    Number (default format): 0.8507691869686248
        original: 0.850769
        improved: 0.850769
        percent: 0.850769
        format_: 0.850769
        f_string_default: 0.8507691869686248
        f_string_g: 0.850769
    
    Performance tests (time took by each function)
        original: 1.7038035999999999
        improved: 1.4332302
        percent: 0.25398619999999994
        format_: 0.37500920000000004
        f_string_default: 0.9683423999999996
        f_string_g: 0.33258160000000014
    
    Done.
    

    As seen, builtin Python alternatives perform way better than CTypes ones. What I find curious (wondering if I didn't do something wrong), is that the f-string variant is much lower (performance-wise) than what I expected it to be (just when using default specifier - things are "a bit" different when using :g - thanks @pankaj for the tip!).
    It might be interesting reading [Python]: Python Patterns - An Optimization Anecdote.