I have the following list comprehensions that I would like to optimize and turn into one single list comprehension, but I can't figure out how to handle the shifts:
lines = [line for line in lines if len(line) > 55]
shifts = [int(int(line.split()[1]) >= 1000000)+int(int(line.split()[4]) >= 100000) for line in lines]
xyz = [(float(line[30+s:38+s]),float(line[38+s:46+s]),float(line[46+s:54+s])) for s,line in zip(shifts,lines)]
I know it should look something like this combined:
xyz = [(float(line[30 + s:38 + s]), float(line[38 + s:46 + s]), float(line[46 + s:54 + s])) for line in lines if len(line) > 55]
but I still need to add/define the s variable in some way. I suspect I can use the walrus operator for it, but I'm not actually sure, since I do not really want to test the condition s, but just want to assign it. So the following doesn't work for instance, since s is used as a condition and sometimes s == 0, which means it is cutting away examples where the shift is zero, which is not what I want:
xyz = [(float(line[30 + s:38 + s]), float(line[38 + s:46 + s]), float(line[46 + s:54 + s])) for line in lines if len(line) > 55 and (s := int(int(line.split()[1]) >= 1000000)+int(int(line.split()[4]) >= 100000))]
I could of course just use the definition of s in all the position instead of s, but that seems ugly and inefficient. So is there a better way of doing this?
Edit: I need this code to be as fast as possible, which is why I have it in list comprehension form rather than loop form, and also why I want to combine the 3 list comprehensions into one.
A similar code in loop form looks like this:
for line in lines:
shift = 0
dat = line.strip('\n')
data = dat.split()
if len(dat) > 55:
if int(data[4]) >= 100000:
shift += 1
if int(data[1]) >= 1000000:
shift += 1
x = float(dat[30+shift:38+shift])
y = float(dat[38+shift:46+shift])
z = float(dat[46+shift:54+shift])
X.append(x)
Y.append(y)
Z.append(z)
else:
continue
I understand that my example isn't the easiest to understand, but what I want is essentially the following in comprehension form
for line in lines:
s = g(line)
result = (f1(line,s),f2(line,s),f3(line,s))
where g,f1,f2,f3 are some non-important functions. So the essential is that s is a function on line, but because I need it several times in the output, I want to temporarily save it as a variable such that I don't have to compute it several times. However, I don't know how to do this during a list comprehension.
The "idiom for assignment a temporary variable in comprehensions" that CPython 3.9 even optimized to be a simple assignment, used for split
and for s
:
xyz = [(float(line[30+s:38+s]),
float(line[38+s:46+s]),
float(line[46+s:54+s]))
for line in lines
if len(line) > 55
for split in [line.split()]
for s in [(int(split[1]) >= 1000000) +
(int(split[4]) >= 100000)]]
And the whole thing spread over multiple short lines like this is pretty readable, in my opinion.
Btw I got rid of the explicit bool
-to-int
conversions. When you add two bool
s, you get an int anyway (e.g., True + True
is 2
) and it's much faster (though might not matter much in your overall code).