Search code examples
pythonregexstringsplitdictionary-comprehension

Using a dictionary comprehension with an included string split operation


Consider a tiny properties parser snippet:

testx="""var1 = foo
         var2 = bar"""

dd = { l.split('=')[0].strip():l.split('=')[1].strip() for l in testx.split('\n')} 
print(dd)
# {'var1': 'foo', 'var2': 'bar'}

That works , but is ugly due to the double invocation of "split" in l.split('=')[0].strip():l.split('=')[1].strip() . How can the dictionary comprehension be changed to only need to split once and then build the dict entries as:

l[0].strip():l[1].strip()

Does that refactoring require a nested for comprehension or a different way of constructing a single level comprehension?


Solution

  • If you are using Python >= 3.8 this is exactly why assignment expressions were added:

    >>> {(parts:=l.split('='))[0].strip(): parts[1].strip() for l in testx.split("\n")}
    {'var1': 'foo', 'var2': 'bar'}
    

    Prior to this, you could do something like:

    >>> {key.strip():value.strip() for l in testx.split('\n') for key, value in [l.split("=")]}
    {'var1': 'foo', 'var2': 'bar'}
    

    Which honestly, I find more readable.

    But honestly, these are both still pretty unreadable to me. At the end of the day, I don't think you can beat:

    >>> result = {}
    >>> for l in testx.split("\n"):
    ...     key, value = l.split("=")
    ...     result[key.strip()] = value.strip()
    ...
    >>> result
    {'var1': 'foo', 'var2': 'bar'}
    

    EDIT

    Note, the for <target list> in [<expression>] idiom has actually been optimized in Python 3.9:

    https://docs.python.org/3/whatsnew/3.9.html#optimizations

    Optimized the idiom for assignment a temporary variable in comprehensions. Now for y in [expr] in comprehensions is as fast as a simple assignment y = expr. For example:

    sums = [s for s in [0] for x in data for s in [s + x]]

    Unlike the := operator this idiom does not leak a variable to the outer scope.

    Compare the bytecode in Pyhton 3.8 vs Pyhton 3.9, you'll notice there is no nested iteration in the Python 3.9 version:

    Python 3.8:

    Python 3.8.1 (default, Jan  8 2020, 16:15:59)
    [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dis
    >>> dis.dis('{k:v for l in "a b|c d".split("|") for k,v in [l.split()]}')
      1           0 LOAD_CONST               0 (<code object <dictcomp> at 0x7fdbd6249d40, file "<dis>", line 1>)
                  2 LOAD_CONST               1 ('<dictcomp>')
                  4 MAKE_FUNCTION            0
                  6 LOAD_CONST               2 ('a b|c d')
                  8 LOAD_METHOD              0 (split)
                 10 LOAD_CONST               3 ('|')
                 12 CALL_METHOD              1
                 14 GET_ITER
                 16 CALL_FUNCTION            1
                 18 RETURN_VALUE
    
    Disassembly of <code object <dictcomp> at 0x7fdbd6249d40, file "<dis>", line 1>:
      1           0 BUILD_MAP                0
                  2 LOAD_FAST                0 (.0)
            >>    4 FOR_ITER                30 (to 36)
                  6 STORE_FAST               1 (l)
                  8 LOAD_FAST                1 (l)
                 10 LOAD_METHOD              0 (split)
                 12 CALL_METHOD              0
                 14 BUILD_TUPLE              1
                 16 GET_ITER
            >>   18 FOR_ITER                14 (to 34)
                 20 UNPACK_SEQUENCE          2
                 22 STORE_FAST               2 (k)
                 24 STORE_FAST               3 (v)
                 26 LOAD_FAST                2 (k)
                 28 LOAD_FAST                3 (v)
                 30 MAP_ADD                  3
                 32 JUMP_ABSOLUTE           18
            >>   34 JUMP_ABSOLUTE            4
            >>   36 RETURN_VALUE
    

    Versus Python 3.9:

    Python 3.9.0 | packaged by conda-forge | (default, Oct 14 2020, 22:56:29)
    [Clang 10.0.1 ] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dis
    >>> dis.dis('{k:v for l in "a b|c d".split("|") for k,v in [l.split()]}')
      1           0 LOAD_CONST               0 (<code object <dictcomp> at 0x7fb3587d1870, file "<dis>", line 1>)
                  2 LOAD_CONST               1 ('<dictcomp>')
                  4 MAKE_FUNCTION            0
                  6 LOAD_CONST               2 ('a b|c d')
                  8 LOAD_METHOD              0 (split)
                 10 LOAD_CONST               3 ('|')
                 12 CALL_METHOD              1
                 14 GET_ITER
                 16 CALL_FUNCTION            1
                 18 RETURN_VALUE
    
    Disassembly of <code object <dictcomp> at 0x7fb3587d1870, file "<dis>", line 1>:
      1           0 BUILD_MAP                0
                  2 LOAD_FAST                0 (.0)
            >>    4 FOR_ITER                22 (to 28)
                  6 STORE_FAST               1 (l)
                  8 LOAD_FAST                1 (l)
                 10 LOAD_METHOD              0 (split)
                 12 CALL_METHOD              0
                 14 UNPACK_SEQUENCE          2
                 16 STORE_FAST               2 (k)
                 18 STORE_FAST               3 (v)
                 20 LOAD_FAST                2 (k)
                 22 LOAD_FAST                3 (v)
                 24 MAP_ADD                  2
                 26 JUMP_ABSOLUTE            4
            >>   28 RETURN_VALUE