Search code examples
pythonpandasiteratorcomparison

Pandas count matches across multiple columns


I have a dataframe with columns from A - Z. The values are 0,1 or NA. I need to iteratively compare columns A and N, A and O and so on until Z, and then cycle back to start the comparison with B and N, B and O, and then from C again. I only need a count of rows where 1 occurs in both columns being compared. How do I accomplish this?


Solution

  • Set operations are made easier with SQL, so the example below uses pandasql to do the comparison you asked for:

    import pandas as pd
    import pandasql as ps
    import string
    
    # Create a string consisting of the letters in the English alphabet in alphabetical order
    alphabet_string = string.ascii_uppercase
    
    #print(alphabet_string)
    
    
    # Create a list of data
    data = []
    
    
    # To approximate your data, use the value 0, 1, and None (~null) for each column
    data.append([0] * len(alphabet_string))
    data.append([1] * len(alphabet_string))
    data.append([None] * len(alphabet_string))
    
    
    # Create the pandas DataFrame  
    df = pd.DataFrame(data, columns = [letter for letter in alphabet_string]) 
    
    
    # Create a list of the letters from A to N
    a_to_n = [letter for letter in alphabet_string if letter < "O"]
    
    print(a_to_n)
    
    # And N to O
    n_to_o = [letter for letter in alphabet_string if letter > "M"]
    
    print(n_to_o)
    
    # Then perform the comparison in a nested loop over the two lists
    for ll in a_to_n:
        for rl in n_to_o:
            cnt = ps.sqldf(f"select count(*) cnt from df where {ll} = 1 and {rl} = 1")["cnt"].iloc[0]
            print(f"Comparing {ll} to {rl}, there were {cnt} rows where the values matched.")
    
    

    The end of which prints as follows:

    Comparing N to U, there were 1 rows where the values matched.
    Comparing N to V, there were 1 rows where the values matched.
    Comparing N to W, there were 1 rows where the values matched.
    Comparing N to X, there were 1 rows where the values matched.
    Comparing N to Y, there were 1 rows where the values matched.
    Comparing N to Z, there were 1 rows where the values matched.