I'm reformatting a bunch of data processing code. The original code first declare several functions, they have certain topological dependency(which means some function rely on other function's result), then calling them sequentially(using a correct topo sort):
def func_1(df):
return df.apply(...)
def func_2(df):
return pd.concat([df, ...])
def func_3(df_1, df_2):
return pd.merge(df_1, df_2)
if __name__ == "__main__":
df=...
df_1 = func_1(df)
df_2 = func_2(df)
result = func_3(df_1, df_2)# the func_3 rely on the result of func_1 & func_2
The problem is that I'm not able to retrieve intermediate data. Say I just want to apply func_1 & func_2, I need to change some code. And it gets complicated when topological dependency gets complicated.
So I want to change into kind of like makefiles's recursive recipe:
def recipe_1(df):
return df.apply(...)
def recipe_2(df):
return pd.concat([df, ...])
def recipe_3(df):
df_1 = recipe_1(df)
df_2 = recipe_2(df)
#some process here.
return
if __name__ == '__main__':
df = ...
recipe_3(df) #Just call the intermediate node I need.
The problem of this approach is I need to collect a lot of variable from recipe_1
andrecipe_2
in recipe_3
, so I think it would be nice if I am able to retrieve the variables from locals()
, which will leave the other code in #some process here.
unchanged.
Now I'm thinking something like this but it looks ugly:
def func_to_be_reconstructed():
a = 3
return locals()
local_variables = func_to_be_reconstructed()
for key in local_variables.keys():
exec(str(key) + '= local_variables[\'' + str(key) + '\']')
better solution?
globals()
and locals()
are just dicts...
So, instead of using exec
in such a fishy way, just update
the dict:
def func_to_be_reconstructed():
a = 3
return locals()
globals().update(func_to_be_reconstructed())