Pairing with itertools combinations - long code execution on large volume of data

I am working on a task to get all possible combinations (pairs) of ID's (companies) which paritipated in one bid and create a new data frame with ID_1, ID_2, matching parameter (tender ID).

I have prepared the following two functions which provide me with the required result however the execution time exceeds several hours when applying to df with more than 500k lines.

import pandas as pd
from itertools import combinations

def get_pairs(dataframe, items):
   return (
       (c1, c2, item)
       for item in items
       for c1, c2 in combinations(dataframe[dataframe['tender_product'] == item]['ID_name'], 2)
   )

def get_all_pairs(df):
        pairs = get_pairs(df,df['tender_product'])
        df= pd.DataFrame(pairs, columns=['ID_name1', 'ID_name2','tender_product'])
        df=df.query('ID_name1 != ID_name2')
        df=df.drop_duplicates()
        df['pair'] = df['ID_name1'].astype(str)+'_'+df['ID_name2'].astype(str)
        df['reversed_pair'] =df['ID_name1'].astype(str)+'_'+df['ID_name2'].astype(str)
        return df

What are the options to optimize the code to make it work faster with the same result?

The sample of intial data:

ID	tender_product
1	tender_1
2	tender_1
3	tender_2
4	tender_2

The sample of current output:

ID1	ID2	tender_product	pair	reversed_pair
1	2	tender_1	1_2	2_1
3	4	tender_2	3_4	4_3

Solution

I'm not sure on how efficient this would be (I don't have the 500K lines to test! :) ) But this should do what you want?

import pandas as pd
data = {'ID': [1,2,3,4,5,6,7,8],
        'tender_product': ['tender_1', 'tender_1', 'tender_2', 'tender_2', 'tender_3', 'tender_3', 'tender_4', 'tender_4']}
df = pd.DataFrame(data)

new_df = df.groupby('tender_product')['ID'].apply(list).reset_index() #Group by tender product and send matching IDs to a list stored in the result column
new_df.rename(columns={'ID': 'ID1'}, inplace=True) #Rename column to ID1
new_df['ID2'] = new_df['ID1'].apply(lambda x: x[1]) #Extract value from list at position 1 (second value in list)
new_df['ID1'] = new_df['ID1'].apply(lambda x: x[0]) #Extract value fromlist at position 0 (first value in list)

#Note: The order is important here. If we replace the ID1 value first, the list held in that cell will cease to exist because we've replaced it

Finally, we create the combination of IDs in two new columns

new_df['pair'] = new_df.apply(lambda row: f"{row['ID1']}_{row['ID2']}", axis=1)
new_df['reversed_pair'] = new_df.apply(lambda row: f"{row['ID2']}_{row['ID1']}", axis=1)

Tested in Jupyter: