Search code examples
pythonpandasdataframepython-3.7pandas-loc

Figuring out if all data in a pandas DataFrame row is the same except for a particular column


I was able with the help of SO to get this working:

# Input string changes but the one I need is always passed in
input_string = "random string"

df.loc[input_string, df.drop(input_string).eq(111).all()] = 111

The above code essentially takes a column and checks if all the cell values of the DataFrame in that column except the one specified by the input_string are 111 and if so it makes that one set to 111 as well.

How do I do this for a row instead?


Solution

  • I think that your logic is backwards. What you're currently doing is actually dropping a row and checking to see if the other rows are matching your condition. See below:

    #do column checking
    df = pd.DataFrame({'a':[111]*10,'b':[111]*10})
    df_col = df.copy() 
    input_col = 'a'
    df_col[input_col] = [222]*len(df.index) #distort input - should be removed in next step
    df_col.loc[df_col.drop(input_col,axis=1).eq(111).all(1), input_col] = 111
    
    #do row checking
    df_row = df.copy()
    input_row = 2
    df_row.iloc[input_row] = [222]*len(df.columns) #distort input - should be removed in next step
    df_row.loc[input_row,df_row.drop(input_row).eq(111).all()] = 111