I'm looking for a cleaner solution to this problem than the one I have come up with:
Based on the string value of a row in column 'C', look that value up in the string values of column 'A', and if found return the string value of column 'B' for that row.
I have a df that looks like this (notice loc[1]['C']):
A B C
0 zeroA zeroB zeroC
1 oneA oneB zeroA
2 twoA twoB twoC
What I need: I need to add another column called 'BB' that is the result of looking up each string value from column 'C' in column 'A' and if a match is found return the string value of column 'B' for that row to my new column 'BB' (as seen at index[1] below). If not, return that row's column 'B' to column 'BB' (as seen at index[0,2] below).
A B C BB
0 zeroA zeroB zeroC zeroB
1 oneA oneB zeroA zeroB
2 twoA twoB twoC twoB
I have come up with a way of doing this by:
I'm sure there is a much cleaner way of doing this. Any suggestion would be appreciated.
What I came up with:
>>> data = {'A': ['zeroA', 'oneA', 'twoA'], 'B': ['zeroB', 'oneB', 'twoB'], 'C': ['zeroC', 'zeroA', 'twoC']}
>>> df = pd.DataFrame(data)
>>> df
A B C
0 zeroA zeroB zeroC
1 oneA oneB zeroA
2 twoA twoB twoC
# Step 1. create a tmp_df containing columns ['A','B'] with 'A' renamed to 'C' and 'B' renamed to 'BB'
>>> tmp_df = df[['A', 'B']].rename({'A':'C', 'B':'BB'}, axis=1)
>>> tmp_df
C BB
0 zeroA zeroB
1 oneA oneB
2 twoA twoB
# Step 2. Left merge tmp_df into df keyed on 'C' columns.
>>> df = pd.merge(df, tmp_df, how='left', on='C')
>>> df
A B C BB
0 zeroA zeroB zeroC NaN
1 oneA oneB zeroA zeroB
2 twoA twoB twoC NaN
# Step 3. Fill in the null rows of column 'BB' with it's row's column 'B' value.
>>> df.loc[df['BB'].isnull(), ['BB']] = df['B']
>>> df
A B C BB
0 zeroA zeroB zeroC zeroB
1 oneA oneB zeroA zeroB
2 twoA twoB twoC twoB
Any suggestion on improving/eliminating the temp_df would be appreciated. Thank You
I prefer to apply a function in these scenarios:
def find_value(row: pd.Series) -> pd.Series:
val = df.query("@df['A'] == @row['C']")
return row["B"] if val.empty else val["B"].squeeze()
df["BB"] = df.apply(lambda x: find_value(x), axis=1)