Search code examples
pythonpandasone-to-one

Easy Way to See if Two Columns are One-to-One in Pandas


Working with data in Python 3+ with pandas. It seems like there should be an easy way to check if two columns have a one-to-one relationship (regardless of column type), but I'm struggling to think of the best way to do this.

Example of expected output:

A    B     C
0    'a'   'apple'
1    'b'   'banana'
2    'c'   'apple'

A & B are one-to-one? TRUE

A & C are one-to-one? FALSE

B & C are one-to-one? FALSE


Solution

  • Well, you can create your own function to check it:

    def isOneToOne(df, col1, col2):
        first = df.groupby(col1)[col2].count().max()
        second = df.groupby(col2)[col1].count().max()
        return first + second == 2
    
    isOneToOne(df, 'A', 'B')
    #True
    isOneToOne(df, 'A', 'C')
    #False
    isOneToOne(df, 'B', 'C')
    #False
    

    In case you data is more like this:

    df = pd.DataFrame({'A': [0, 1, 2, 0],
                       'C': ["'apple'", "'banana'", "'apple'", "'apple'"],
                       'B': ["'a'", "'b'", "'c'", "'a'"]})
    df
    #   A    B         C
    #0  0  'a'   'apple'
    #1  1  'b'  'banana'
    #2  2  'c'   'apple'
    #3  0  'a'   'apple'
    

    Then you can use:

    def isOneToOne(df, col1, col2):
        first = df.drop_duplicates([col1, col2]).groupby(col1)[col2].count().max()
        second = df.drop_duplicates([col1, col2]).groupby(col2)[col1].count().max()
        return first + second == 2