Search code examples
pythoncode-duplicationcontrol-flow

avoiding code duplication in Python code


Consider the following Python snippet:

af=open("a",'r')
bf=open("b", 'w')

for i, line in enumerate(af):
    if i < K:
        bf.write(line)

Now, suppose I want to handle the case where K is None, so the writing continues to the end of the file. I'm currently doing

if K is None:
    for i, line in enumerate(af):
        bf.write(line)
else:
    for i, line in enumerate(af):            
        bf.write(line)
        if i==K:
            break

This clearly isn't the best way to handle this, as I'm duplicating the code. Is there some more integrated way I can handle this? The natural thing would be to have the if/break code only be present if K is not None, but this involves writing syntax on the fly a la Lisp macros, which Python can't really do. Just to be clear, I'm not concerned about the particular case (which I choose partly for its simplicity), so much as learning about general techniques I may not be familar with.

UPDATE: After reading answers people have posted, and doing more experimentation, here are some more comments.

As said above, I was looking for general techniques that would be generalizable, and I think @Paul's answer,namely using takewhile from iterrools, fits that best. As a bonus, it is also much faster than the naive method i listed above; I'm not sure why. I'm not really familar with itertools, though I've looked at it a few times. From my perspective this is a case of functional programming For The Win! (Amusingly, the author of itertools once asked for feedback about dropping takewhile. See the thread beginning http://mail.python.org/pipermail/python-list/2007-December/522529.html.) I'd simplified my situation above, the actual situation is a bit more messy - I'm writing to two different files in the loop. So the code looks more like:

for i, line in enumerate(af):
    if i < K:
        bf.write(line)
        cf.write(line.split(',')[0].strip('"')+'\n')

Given my posted example, @Jeff reasonably suggested that in the case when K was None, I just copy the file. Since in practice I am looping anyway, doing so is not such a clear choice. However, takewhile generalizes painlessly to this case. I also had another use case I did not mention here, and was able to use takewhile there too, which was nice. The second example looks like (verbatim)

i=0
for line in takewhile(illuminacond, af):
    line_split=line.split(',')
    pid=line_split[1][0:3]
    out = line_split[1] + ',' + line_split[2] + ',' + line_split[3][1] + line_split[3][3] + ',' \
                        + line_split[15] + ',' + line_split[9] + ',' + line_split[10]
    if pid!='cnv' and pid!='hCV' and pid!='cnv':
        i = i+1
        of.write(out.strip('"')+'\n')
        tf.write(line)

here I was able to use the condition

if K is None:
    illuminacond = lambda x: x.split(',')[0] != '[Controls]'
else:
    illuminacond = lambda x: x.split(',')[0] != '[Controls]' and i < K

per @Paul's original example. However, I'm not completely happy about the fact that I'm getting i from the outer scope, though the code works. Is there a better way of doing this? Or maybe it should be a separate question. Anyway, thanks to everyone who answered my question. Honorable mention to @Jeff, who made some nice suggestions.


Solution

  • itertools.takewhile will apply your condition, and then break out of the loop the first time the condition fails.

    from itertools import takewhile
    
    if K is None:
        condition = lambda x: True
    else:
        condition = lambda x: x[0] < K
    
    for i,line in takewhile(condition, enumerate(af)):
        bf.write(line)
    

    If K is None, then you don't want takewhile to ever stop, so the condition function should always return True. But if you are given a numeric value for K, then once the 0'th element of the tuple passed to the condition >= K, then takewhile will stop.