Generators are elegant and memory-efficient, so I'd like to convert normal functions producing sequences to generators. But I haven't found an easy conversion method, by easy I mean it is best to be done automatically by some tools, maybe simple as string or regex replacement(return
to yield
, for example). If the conversion cannot always be easy, refactoring may introduce bugs. Of course unit testing can capture some bugs, but if the normal function is already fully tested, I doubt the necessity writing extra tests for the generator.
Let me give an example, normal function normalize
accepts a src
parameter, if src
is a list
, then return the list, otherwise return one list with src
as sigle item. Here for simplicity we suppose src
is provided by the caller as either a list
or a simple scalar.
f = normalize # replace with other conversions
assert list(f(2)) == [2]
assert list(f([1, 2])) == [1, 2]
def normalize(src):
if isinstance(src, list):
return src
return [src]
Now I convert it to generator, adding one else
block, test passes:
def normalize_generator(src):
if isinstance(src, list):
for x in src:
yield x
else:
yield src
What I want is keeping the code structure of normalize
, only replacing some keywords, like following, of course test will fail.
def normalize_generator_no_else(src):
if isinstance(src, list):
for x in src:
yield x
yield src
test result fail: left is [1, 2, [1, 2]], right is [1, 2]
I basically understand this behaviour, it seems yield
continues code in following lines. I've searched and found similar question Does python yield imply continue? but I haven't found a solution to my question.
For the specific conversion you want that "maintains" the code structure failing to work, you have to realize that yield
does not equate to return
, and there is no equivalent early return
within a block that can be achieved simply using the yield
keyword. For that you will need to insert an additional return
after the yield
, so the corrected generator should look like:
def normalize_generator_no_else(src):
if isinstance(src, list):
for x in src:
yield x
return # ensure function returns early to not execute the rest of it
yield src
This would ensure the early return within the if
block of the original function also is maintained here (again, using the return
inside the if
block, though it must return nothing (i.e. None
).
Now a word of warning: a general conversion solution that works for all cases that will achieve your goal of "memory-efficient" may be difficult as there are arbitrary ways the function can return a list, as opposed to yielding values where the effects of the yield is immediate; a function may have multiple returns in various if blocks of arbitrary local list variables that may be modified at various places, and simply yielding the items from the list will not achieve the "elegant and memory-efficient" trait you desire.
Generally, a proper rewrite of the function to yield the appropriate values at the appropriate level will need to be done, and in some cases it may not be possible if the function removes some values from the list to be returned. The function that was corrected above (along with the original) will simply keep a reference to the list of all the values in memory anyway, which defeats the goal at being "memory-efficient", though if it is meant as an intermediate function for processing a list of values then it is no worse or better, but for function that produces new values then for sure there is no general solution, it must be catered for the problem solved by the function at hand.