Search code examples
pythonpandaspep8

Is overwriting variables names for lengthy operations bad style?


I quite often find myself in a situation where I undertake several steps to get from my start data input to the output I want to have, e.g. in functions/loops. To avoid making my lines very long, I sometimes overwrite the variable name I am using in these operations.

One example would be:

df_2 = df_1.loc[(df1['id'] == val)]
df_2 = df_2[['c1','c2']]
df_2 = df_2.merge(df3, left_on='c1', right_on='c1'))

The only alternative I can come up with is:

df_2 = df_1.loc[(df1['id'] == val)][['c1','c2']]\
    .merge(df3, left_on='c1', right_on='c1'))  

But none of these options feels really clean. how should these situations be handled?


Solution

  • You can refer to this article which discusses exactly your question.

    The pandas core team now encourages the use of "method chaining". This is a style of programming in which you chain together multiple method calls into a single statement. This allows you to pass intermediate results from one method to the next rather than storing the intermediate results using variables.

    In addition to prettifying the chained codes by using brackets and indentation like @perl's answer, you might also find using functions like .query() and .assign() useful for coding in a "method chaining" style.

    Of course, there are some drawbacks for method chaining, especially when excessive:

    "One drawback to excessively long chains is that debugging can be harder. If something looks wrong at the end, you don't have intermediate values to inspect."