Working with data in Python 3+ with pandas. It seems like there should be an easy way to check if two columns have a one-to-one relationship (regardless of column type), but I'm struggling to think of the best way to do this.
Example of expected output:
A B C
0 'a' 'apple'
1 'b' 'banana'
2 'c' 'apple'
A & B are one-to-one? TRUE
A & C are one-to-one? FALSE
B & C are one-to-one? FALSE
Well, you can create your own function to check it:
def isOneToOne(df, col1, col2):
first = df.groupby(col1)[col2].count().max()
second = df.groupby(col2)[col1].count().max()
return first + second == 2
isOneToOne(df, 'A', 'B')
#True
isOneToOne(df, 'A', 'C')
#False
isOneToOne(df, 'B', 'C')
#False
In case you data is more like this:
df = pd.DataFrame({'A': [0, 1, 2, 0],
'C': ["'apple'", "'banana'", "'apple'", "'apple'"],
'B': ["'a'", "'b'", "'c'", "'a'"]})
df
# A B C
#0 0 'a' 'apple'
#1 1 'b' 'banana'
#2 2 'c' 'apple'
#3 0 'a' 'apple'
Then you can use:
def isOneToOne(df, col1, col2):
first = df.drop_duplicates([col1, col2]).groupby(col1)[col2].count().max()
second = df.drop_duplicates([col1, col2]).groupby(col2)[col1].count().max()
return first + second == 2