Search code examples
pythonpandasdataframedrop

Python drop columns in string range


I want to drop all columns whose name starts by 'var' and whose content is 'None'. Sample of my dataframe:

id var1 var2 newvar1 var3 var4 newvar2
1  x    y    dt      None f    None

Dataframe that I want:

id var1 var2 newvar1 var4 newvar2
1  x    y    dt      f    None

I want to do this for several files and I do not know how many 'var' I have in all of them. My dataframe has only one row. Here is the code that I tried:

for i in range(1,300):
    df.drop(df.loc[df['var'+str(i)] == 'None' ].index, inplace=True)

Error obtained:

KeyError: 'var208'

I also tried:

df.drop(df.loc[df['var'+str(i) for i in range(1,300)] == 'None'].index, inplace=True)
SyntaxError: invalid syntax

Could anyone help me improve my code?


Solution

  • Your error occurs because you have no column with that name. You can use df.columns to get a list of available columns, check if the name .startswith("var") and use df[col].isnull().all() to check if all values are None.

    import pandas as pd
    
    df = pd.DataFrame(columns=["id", "var1", "var2", "newvar1", "var3", "var4", "newvar2"],
                      data=[[1, "x", "y", "dt", None, "f", None]])
    
    
    df.drop([col for col in df.columns if col.startswith("var") and df[col].isnull().all()], axis=1, inplace=True)