Search code examples
pythondataframeif-statementbooleanstring-matching

Find column and row of partial string match in entire pandas dataframe


I'm trying to locate the row and the column in a pandas dataframe that partially match a given string. So far, my approach is based on an iteration over all the columns (and the rows in every column) to return boolean "True" values:

rowindex = []
columnindex = []

i = 0

for i in range (0, len(df.columns)):
    ask = df.iloc[:, i].str.contains('string')
    
    for j in range (0, len(ask)):
        ask2 = np.equal(ask, True) 
        if ask2 == True:
            columnindex.append(i)
            rowindex.append(j)
            
        j + 1
            
    i + 1

The problem is that I always get this error message for the "if ask2 == True:" statement:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thank you for your help on this!


Solution

  • If you want to locate the rows and columns in a pandas DataFrame that partially match a given string, you can use the vectorized operations provided by pandas instead of iterating over all the columns and rows. This approach is more efficient and recommended for working with pandas DataFrames. Here's an example of how you can achieve this:

    import pandas as pd
    data = {
        'Name': ['John Doe', 'Jane Smith', 'Mike Johnson'],
        'Age': [25, 30, 35],
        'City': ['New York', 'London', 'Paris']
    }
    
    df = pd.DataFrame(data)
    partial_string = 'Jo'
    #check if each element in the DataFrame contains the partial string
    matches = df.apply(lambda col: col.astype(str).str.contains(partial_string, case=False))
    
    #get the row and column indices where the partial string matches
    rows, cols = matches.values.nonzero()
    
    for row, col in zip(rows, cols):
        print(f"Match found at Row: {row}, Column: {df.columns[col]}")