What is _md5.md5 and why is hashlib.md5 so much slower?

Found this undocumented _md5 when getting frustrated with the slow stdlib hashlib.md5 implementation.

On a macbook:

>>> timeit hashlib.md5(b"hello world")
597 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"hello world")
224 ns ± 3.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> _md5
<module '_md5' from '/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/'>

On a Windows box:

>>> timeit hashlib.md5(b"stonk overflow")
328 ns ± 21.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"stonk overflow")
110 ns ± 12.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> _md5
<module '_md5' (built-in)>

On a Linux box:

>>> timeit hashlib.md5(b"")
259 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit _md5.md5(b"")
102 ns ± 0.0576 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> _md5
<module '_md5' from '/usr/local/lib/python3.8/lib-dynload/'>

For hashing short messages, it's way faster. For long messages, similar performance.

Why is it hidden away in an underscore extension module, and why isn't this faster implementation used by default in hashlib? What is the _md5 module and why doesn't it have public API?


  • Until Python 2.5, hashes and digests were implemented in their own modules (e.g. [Python 2.Docs]: md5 - MD5 message digest algorithm).
    Starting with v2.5, [Python 2.6.Docs]: hashlib - Secure hashes and message digests was added. Its purpose was to:

    1. Offer an unified access method to the hashes / digests (via their name)

    2. Switch (by default) to an external cryptography provider (it seems the logical step to delegate to some entity specialized in that field, as maintaining all those algorithms could be an overkill). At that time OpenSSL was the best choice: mature enough, known and compatible (there were a bunch of similar Java providers, but those were pretty useless)

    As a side effect of #2., the Python implementations were hidden from the public API (renamed them: _md5, _sha1, _sha256, _sha512, and the latter ones added: _blake2, _sha3), as redundancy often creates confusions.
    But, another side effect was dependency on OpenSSL's libcrypto*.so (this is Nix (at least Linux) specific, on Win, a static libeay32.lib was linked in _hashlib.pyd, and also _ssl.pyd (which I consider lame), till v3.7+, where OpenSSL .dlls are part of the Python installation).
    Probably on 90%+ of the machines things were smooth, as OpenSSL was / is installed by default, but for those where it isn't, many things might get broken because for example hashlib is imported by many modules (one such example is random which itself gets imported by lots of others), so trivial pieces of code that are not related at all to cryptography (at least not at 1st sight) will stop working. That's why the old implementations are kept (but again, they are only fallbacks as OpenSSL versions are / should be better maintained).

    [cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q059955854]> ~/
    ### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
    [064bit-prompt]> python3 -c "import sys, hashlib as hl, _md5, ssl;print(\"{0:}\n{1:}\n{2:}\n{3:}\".format(sys.version, _md5, hl._hashlib, ssl.OPENSSL_VERSION))"
    3.5.2 (default, Oct  8 2019, 13:06:37)
    [GCC 5.4.0 20160609]
    <module '_md5' (built-in)>
    <module '_hashlib' from '/usr/lib/python3.5/lib-dynload/'>
    OpenSSL 1.0.2g  1 Mar 2016
    [064bit-prompt]> ldd /usr/lib/python3.5/lib-dynload/
   =>  (0x00007fffa7d0b000)
   => /lib/x86_64-linux-gnu/ (0x00007f50d9e4d000)
   => /lib/x86_64-linux-gnu/ (0x00007f50d9a83000)
   => /lib/x86_64-linux-gnu/ (0x00007f50d963e000)
            /lib64/ (0x00007f50da271000)
   => /lib/x86_64-linux-gnu/ (0x00007f50d943a000)
    [064bit-prompt]> openssl version -a
    OpenSSL 1.0.2g  1 Mar 2016
    built on: reproducible build, date unspecified
    platform: debian-amd64
    options:  bn(64,64) rc4(16x,int) des(idx,cisc,16,int) blowfish(idx)
    compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
    OPENSSLDIR: "/usr/lib/ssl"
    [064bit-prompt]> python3 -c "import _md5, hashlib as hl;print(_md5.md5(b\"A\").hexdigest(), hl.md5(b\"A\").hexdigest())"
    7fc56270e7a70fa81a5935b72eacbe29 7fc56270e7a70fa81a5935b72eacbe29

    According to [Python 3.Docs]: hashlib.algorithms_guaranteed:

    A set containing the names of the hash algorithms guaranteed to be supported by this module on all platforms. Note that ‘md5’ is in this list despite some upstream vendors offering an odd “FIPS compliant” Python build that excludes it.

    Below it's an example of a custom Python 2.7 installation (that I built quite a while ago, worth mentioning that it dynamically links to OpenSSL .dlls):

    [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q059955854]> sopr.bat
    ### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
    [prompt]> "F:\Install\pc064\HPE\OPSWpython\2.7.10__00\python.exe" -c "import sys, ssl;print(\"{0:}\n{1:}\".format(sys.version, ssl.OPENSSL_VERSION))"
    2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]
    OpenSSL 1.0.2j-fips  26 Sep 2016
    [prompt]> "F:\Install\pc064\HPE\OPSWpython\2.7.10__00\python.exe" -c "import hashlib as hl;print(hl.md5(\"A\").hexdigest())"
    [prompt]> "F:\Install\pc064\HPE\OPSWpython\2.7.10__00\python.exe" -c "import ssl;ssl.FIPS_mode_set(True);import hashlib as hl;print(hl.md5(\"A\").hexdigest())"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ValueError: error:060A80A3:digital envelope routines:FIPS_DIGESTINIT:disabled for fips

    As for the speed question I can only speculate:

    • Python implementation was (obviously) written specifically for Python, meaning it is "more optimized" (yes, this is grammatically incorrect) for Python than a generic version, and also resides in python*.so (or the python executable itself)

    • OpenSSL implementation resides in libcrypto*.so, and it's being accessed by the wrapper, which does the back and forth conversions between Python types (PyObject*) and the OpenSSL ones (EVP_MD_CTX*)

    Considering the above, it would make sense that the former is (slightly) faster (at least for small messages, where the overhead (function call and other Python underlying operations) takes a significant percentage of the total time compared to the hashing itself). There are also other factors to be considered (e.g. whether OpenSSL assembler speedups were used).

    Update #0

    Below are some benchmarks of my own.

    #!/usr/bin/env python
    import sys
    import timeit
    from hashlib import md5 as md5_openssl
    from _md5 import md5 as md5_builtin
    MD5S = (
    def main(*argv):
        base_text = b"A"
        number = 1000000
        print("timeit attempts number: {:d}".format(number))
        #x = []
        #y = {}
        for count in range(0, 16):
            factor = 2 ** count
            text = base_text * factor
            globals_dict = {"text": text}
            print("\nUsing a {:8d} (2 ** {:2d}) bytes message".format(len(text), count))
            for func in MD5S:
                globals_dict["md5"] = func
                t = timeit.timeit(stmt="md5(text)", globals=globals_dict, number=number)
                print("    {:12s} took: {:11.6f} seconds".format(func.__name__, t))
                #y.setdefault(func.__name__, []).append(t)
        #print(x, y)
    if __name__ == "__main__":
        print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                       64 if sys.maxsize > 0x100000000 else 32, sys.platform))
        rc = main(*sys.argv[1:])


    • Win 10 pc064 (running on a Dell Precision 5510 laptop):

      [prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" ./
      Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32
      timeit attempts number: 1000000
      Using a        1 (2 **  0) bytes message
          openssl_md5  took:    0.449134 seconds
          md5          took:    0.120021 seconds
      Using a        2 (2 **  1) bytes message
          openssl_md5  took:    0.460399 seconds
          md5          took:    0.118555 seconds
      Using a        4 (2 **  2) bytes message
          openssl_md5  took:    0.451850 seconds
          md5          took:    0.121166 seconds
      Using a        8 (2 **  3) bytes message
          openssl_md5  took:    0.438398 seconds
          md5          took:    0.118127 seconds
      Using a       16 (2 **  4) bytes message
          openssl_md5  took:    0.454653 seconds
          md5          took:    0.122818 seconds
      Using a       32 (2 **  5) bytes message
          openssl_md5  took:    0.450776 seconds
          md5          took:    0.118594 seconds
      Using a       64 (2 **  6) bytes message
          openssl_md5  took:    0.555761 seconds
          md5          took:    0.278812 seconds
      Using a      128 (2 **  7) bytes message
          openssl_md5  took:    0.681296 seconds
          md5          took:    0.455921 seconds
      Using a      256 (2 **  8) bytes message
          openssl_md5  took:    0.895952 seconds
          md5          took:    0.807457 seconds
      Using a      512 (2 **  9) bytes message
          openssl_md5  took:    1.401584 seconds
          md5          took:    1.499279 seconds
      Using a     1024 (2 ** 10) bytes message
          openssl_md5  took:    2.360966 seconds
          md5          took:    2.878650 seconds
      Using a     2048 (2 ** 11) bytes message
          openssl_md5  took:    4.383245 seconds
          md5          took:    5.655477 seconds
      Using a     4096 (2 ** 12) bytes message
          openssl_md5  took:    8.264774 seconds
          md5          took:   10.920909 seconds
      Using a     8192 (2 ** 13) bytes message
          openssl_md5  took:   15.521947 seconds
          md5          took:   21.895179 seconds
      Using a    16384 (2 ** 14) bytes message
          openssl_md5  took:   29.947287 seconds
          md5          took:   43.198639 seconds
      Using a    32768 (2 ** 15) bytes message
          openssl_md5  took:   59.123447 seconds
          md5          took:   86.453821 seconds
    • Ubuntu 16 pc064 (VM running in VirtualBox on the above machine):

      [064bit-prompt]> python3 ./
      Python 3.5.2 (default, Oct  8 2019, 13:06:37) [GCC 5.4.0 20160609] 64bit on linux
      timeit attempts number: 1000000
      Using a        1 (2 **  0) bytes message
          openssl_md5  took:    0.246166 seconds
          md5          took:    0.130589 seconds
      Using a        2 (2 **  1) bytes message
          openssl_md5  took:    0.251019 seconds
          md5          took:    0.127750 seconds
      Using a        4 (2 **  2) bytes message
          openssl_md5  took:    0.257018 seconds
          md5          took:    0.123116 seconds
      Using a        8 (2 **  3) bytes message
          openssl_md5  took:    0.245399 seconds
          md5          took:    0.128267 seconds
      Using a       16 (2 **  4) bytes message
          openssl_md5  took:    0.251832 seconds
          md5          took:    0.136373 seconds
      Using a       32 (2 **  5) bytes message
          openssl_md5  took:    0.248410 seconds
          md5          took:    0.140708 seconds
      Using a       64 (2 **  6) bytes message
          openssl_md5  took:    0.361016 seconds
          md5          took:    0.267021 seconds
      Using a      128 (2 **  7) bytes message
          openssl_md5  took:    0.478735 seconds
          md5          took:    0.413986 seconds
      Using a      256 (2 **  8) bytes message
          openssl_md5  took:    0.707602 seconds
          md5          took:    0.695042 seconds
      Using a      512 (2 **  9) bytes message
          openssl_md5  took:    1.216832 seconds
          md5          took:    1.268570 seconds
      Using a     1024 (2 ** 10) bytes message
          openssl_md5  took:    2.122014 seconds
          md5          took:    2.429623 seconds
      Using a     2048 (2 ** 11) bytes message
          openssl_md5  took:    4.158188 seconds
          md5          took:    4.847686 seconds
      Using a     4096 (2 ** 12) bytes message
          openssl_md5  took:    7.839173 seconds
          md5          took:    9.242224 seconds
      Using a     8192 (2 ** 13) bytes message
          openssl_md5  took:   15.282232 seconds
          md5          took:   18.368874 seconds
      Using a    16384 (2 ** 14) bytes message
          openssl_md5  took:   30.681912 seconds
          md5          took:   36.755073 seconds
      Using a    32768 (2 ** 15) bytes message
          openssl_md5  took:   60.230543 seconds
          md5          took:   73.237356 seconds
    • Ubuntu 22 pc064 (dual-boot on the same machine):

      [064bit prompt]> python ./ 
      Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] 064bit on linux
      timeit attempts number: 1000000
      Using a        1 (2 **  0) bytes message
          openssl_md5  took:    0.258825 seconds
          md5          took:    0.092418 seconds
      Using a        2 (2 **  1) bytes message
          openssl_md5  took:    0.265123 seconds
          md5          took:    0.095969 seconds
      Using a        4 (2 **  2) bytes message
          openssl_md5  took:    0.273572 seconds
          md5          took:    0.098485 seconds
      Using a        8 (2 **  3) bytes message
          openssl_md5  took:    0.267524 seconds
          md5          took:    0.102606 seconds
      Using a       16 (2 **  4) bytes message
          openssl_md5  took:    0.295750 seconds
          md5          took:    0.102688 seconds
      Using a       32 (2 **  5) bytes message
          openssl_md5  took:    0.266704 seconds
          md5          took:    0.095375 seconds
      Using a       64 (2 **  6) bytes message
          openssl_md5  took:    0.350251 seconds
          md5          took:    0.209725 seconds
      Using a      128 (2 **  7) bytes message
          openssl_md5  took:    0.559193 seconds
          md5          took:    0.362671 seconds
      Using a      256 (2 **  8) bytes message
          openssl_md5  took:    0.685720 seconds
          md5          took:    0.589242 seconds
      Using a      512 (2 **  9) bytes message
          openssl_md5  took:    1.100991 seconds
          md5          took:    1.081601 seconds
      Using a     1024 (2 ** 10) bytes message
          openssl_md5  took:    2.069975 seconds
          md5          took:    2.176450 seconds
      Using a     2048 (2 ** 11) bytes message
          openssl_md5  took:    3.742486 seconds
          md5          took:    4.197531 seconds
      Using a     4096 (2 ** 12) bytes message
          openssl_md5  took:    7.186287 seconds
          md5          took:    8.270421 seconds
      Using a     8192 (2 ** 13) bytes message
          openssl_md5  took:   13.889762 seconds
          md5          took:   16.225811 seconds
      Using a    16384 (2 ** 14) bytes message
          openssl_md5  took:   27.422105 seconds
          md5          took:   32.898019 seconds
      Using a    32768 (2 ** 15) bytes message
          openssl_md5  took:   54.010482 seconds
          md5          took:   64.579159 seconds

    The result seem to be quite different than yours. In my case:

    • Starting somewhere in [~512B .. ~1KiB] sized messages, OpenSSL implementation seems to perform better than builtin one

    • I know that there are too few results to claim a pattern, but it seems that both implementations seem to be linearly proportional (in terms of time) with message size (but the builtin slope seems to be a bit steeper - meaning it will perform worse on the long run)

    As a conclusion, if all your messages are small, and the builtin implementation works best for you, then use it.

    Update #1

    Graphical representation (I had to reduce the timeit iterations number by an order of magnitude, as it would take much too long for large messages):


    and zooming on the area where the 2 graphs intersect:
