Search code examples
pythonpandaslistdataframenested-loops

Replacing nested loops over a dataframe with faster/more efficient alternatives


I want to eliminate the nested loop in my code and I can't seem to figure out the best way to do it. I have explained what I am trying to do below:

I have a dataframe df.

data = [['1A', 'apple', '35-44', 'male', ['apple', 'strawberry', 'pineapple']], ['1B', 'banana', '15-24', 'female', ['apple', 'banana', 'durian']], \
        ['1C', 'cranberry', '35-44', 'male', ['cranberry', 'apple', 'durian']], ['1D','durian', '15-24', 'female', ['durian', 'kiwi', 'banana']], \
        ['1E', 'elderberry', '35-44', 'male', ['elderberry', 'apple', 'papaya']]]
df = pd.DataFrame(data, columns= ['ID','fav_fruit','age_group', 'gender', 'top3_fruits'])
    ID  fav_fruit   age_group   gender  top3_fruits
0   1A  apple        35-44       male   [apple, strawberry, pineapple]
1   1B  banana       15-24      female  [apple, banana, durian]
2   1C  cranberry    35-44       male   [cranberry, apple, durian]
3   1D  durian       15-24      female  [durian, kiwi, banana]
4   1E  elderberry   35-44      male    [elderberry, apple, papaya]

Now, in this dataframe I want to check and compare each row with all the other rows for certain conditions.

  1. I want to check if the age_group and gender are equal
  2. I want to check if fav_fruit is in top3_fruits.

If the conditions are met, then I want to append the 'ID' and 'top3_fruits' of the matched rows as separate columns at the end of the dataframe df.

This is the code I have written to do this with a nested for-loop.

df_copy = df.copy()
sample_df = pd.DataFrame()
matching_id = []
fruits_to_recommend = []
for i in range(len(df)):
    for j in range(len(df)):
        if (i!=j) and (df.iloc[i]['fav_fruit'] in df_copy.iloc[j]['top3_fruits']) and \
        (df.iloc[i]['gender'] == df_copy.iloc[j]['gender']) and\
        (df.iloc[i]['age_group'] == df_copy.iloc[j]['age_group']):
            sample_df = sample_df.append(df_copy.iloc[[i]])
            matching_id.append(df_copy.iloc[j]['ID'])
            fruits_to_recommend.append(df_copy.iloc[j]['top3_fruits'])
            sample_df['matching_id'] = matching_id
            sample_df['fruits_to_recommend'] = fruits_to_recommend

The results I am looking for looks like this. Result: enter image description here

I am looking for more feasible/faster options.


Solution

  • my approach would be using .explode() method and pandas.merge() function.

    >>> df_explode = df.copy()
    >>> # copy column
    >>> df_explode['fruits_to_recommend'] = df['top3_fruits']
    >>> # explode list and rename column
    >>> df_explode = df_explode.explode('top3_fruits').rename(columns={'ID':'matching_id'}) 
    >>> print(df_explode)
        matching_id fav_fruit   age_group   gender  top3_fruits fruits_to_recommend
    0   1A          apple       35-44       male    apple       ['apple', 'strawberry', 'pineapple']
    0   1A          apple       35-44       male    strawberry  ['apple', 'strawberry', 'pineapple']
    0   1A          apple       35-44       male    pineapple   ['apple', 'strawberry', 'pineapple']
    1   1B          banana      15-24       female  apple       ['apple', 'banana', 'durian']
    1   1B          banana      15-24       female  banana      ['apple', 'banana', 'durian']
    1   1B          banana      15-24       female  durian      ['apple', 'banana', 'durian']
    2   1C          cranberry   35-44       male    cranberry   ['cranberry', 'apple', 'durian']
    2   1C          cranberry   35-44       male    apple       ['cranberry', 'apple', 'durian']
    2   1C          cranberry   35-44       male    durian      ['cranberry', 'apple', 'durian']
    3   1D          durian      15-24       female  durian      ['durian', 'kiwi', 'banana']
    3   1D          durian      15-24       female  kiwi        ['durian', 'kiwi', 'banana']
    3   1D          durian      15-24       female  banana      ['durian', 'kiwi', 'banana']
    4   1E          elderberry  35-44       male    elderberry  ['elderberry', 'apple', 'papaya']
    4   1E          elderberry  35-44       male    apple       ['elderberry', 'apple', 'papaya']
    4   1E          elderberry  35-44       male    papaya      ['elderberry', 'apple', 'papaya']
    
    >>> # merging
    >>> df_merged = pd.merge(df, df_explode, how='left', left_on = ['age_group', 'gender', 'fav_fruit'], right_on = ['age_group', 'gender', 'top3_fruits'], suffixes=('','_'))
    >>> # select columns and filter matching_id's which are equal to ID
    >>> df_merged = df_merged.loc[df_merged['ID']!=df_merged['matching_id'], list(df.columns) + ['matching_id', 'fruits_to_recommend']]
    >>> print(df_merged)
        ID  fav_fruit   age_group   gender  top3_fruits                             matching_id fruits_to_recommend
    1   1A  apple       35-44       male    ['apple', 'strawberry', 'pineapple']    1C          ['cranberry', 'apple', 'durian']
    2   1A  apple       35-44       male    ['apple', 'strawberry', 'pineapple']    1E          ['elderberry', 'apple', 'papaya']
    4   1B  banana      15-24       female  ['apple', 'banana', 'durian']           1D          ['durian', 'kiwi', 'banana']
    6   1D  durian      15-24       female  ['durian', 'kiwi', 'banana']            1B          ['apple', 'banana', 'durian']