Search code examples
pythonpython-3.xlist-comprehension

Creating tmp variables inside a list comprehension


I have the following list comprehensions that I would like to optimize and turn into one single list comprehension, but I can't figure out how to handle the shifts:

lines = [line for line in lines if len(line) > 55]
shifts = [int(int(line.split()[1]) >= 1000000)+int(int(line.split()[4]) >= 100000) for line in lines]
xyz = [(float(line[30+s:38+s]),float(line[38+s:46+s]),float(line[46+s:54+s])) for s,line in zip(shifts,lines)]

I know it should look something like this combined:

xyz = [(float(line[30 + s:38 + s]), float(line[38 + s:46 + s]), float(line[46 + s:54 + s])) for line in lines if len(line) > 55]

but I still need to add/define the s variable in some way. I suspect I can use the walrus operator for it, but I'm not actually sure, since I do not really want to test the condition s, but just want to assign it. So the following doesn't work for instance, since s is used as a condition and sometimes s == 0, which means it is cutting away examples where the shift is zero, which is not what I want:

xyz = [(float(line[30 + s:38 + s]), float(line[38 + s:46 + s]), float(line[46 + s:54 + s])) for line in lines if len(line) > 55 and (s := int(int(line.split()[1]) >= 1000000)+int(int(line.split()[4]) >= 100000))]

I could of course just use the definition of s in all the position instead of s, but that seems ugly and inefficient. So is there a better way of doing this?

Edit: I need this code to be as fast as possible, which is why I have it in list comprehension form rather than loop form, and also why I want to combine the 3 list comprehensions into one.

A similar code in loop form looks like this:

for line in lines:
    shift = 0
    dat = line.strip('\n')
    data = dat.split()
    if len(dat) > 55:
        if int(data[4]) >= 100000:
            shift += 1
        if int(data[1]) >= 1000000:
            shift += 1
        x = float(dat[30+shift:38+shift])
        y = float(dat[38+shift:46+shift])
        z = float(dat[46+shift:54+shift])
        X.append(x)
        Y.append(y)
        Z.append(z)
    else:
        continue

I understand that my example isn't the easiest to understand, but what I want is essentially the following in comprehension form

for line in lines:
    s = g(line)
    result = (f1(line,s),f2(line,s),f3(line,s))

where g,f1,f2,f3 are some non-important functions. So the essential is that s is a function on line, but because I need it several times in the output, I want to temporarily save it as a variable such that I don't have to compute it several times. However, I don't know how to do this during a list comprehension.


Solution

  • The "idiom for assignment a temporary variable in comprehensions" that CPython 3.9 even optimized to be a simple assignment, used for split and for s:

    xyz = [(float(line[30+s:38+s]),
            float(line[38+s:46+s]),
            float(line[46+s:54+s]))
           for line in lines
           if len(line) > 55
           for split in [line.split()]
           for s in [(int(split[1]) >= 1000000) +
                     (int(split[4]) >= 100000)]]
    

    And the whole thing spread over multiple short lines like this is pretty readable, in my opinion.

    Btw I got rid of the explicit bool-to-int conversions. When you add two bools, you get an int anyway (e.g., True + True is 2) and it's much faster (though might not matter much in your overall code).