Search code examples
pythonpandasmergecartesian-productcross-join

cartesian product in pandas


I have two pandas dataframes:

from pandas import DataFrame
df1 = DataFrame({'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'col3':[5,6]})     

What is the best practice to get their cartesian product (of course without writing it explicitly like me)?

#df1, df2 cartesian product
df_cartesian = DataFrame({'col1':[1,2,1,2],'col2':[3,4,3,4],'col3':[5,5,6,6]})

Solution

  • In recent versions of Pandas (>= 1.2) this is built into merge so you can do:

    from pandas import DataFrame
    df1 = DataFrame({'col1':[1,2],'col2':[3,4]})
    df2 = DataFrame({'col3':[5,6]})    
    
    df1.merge(df2, how='cross')
    

    This is equivalent to the previous pandas < 1.2 answer but is easier to read.


    For pandas < 1.2:

    If you have a key that is repeated for each row, then you can produce a cartesian product using merge (like you would in SQL).

    from pandas import DataFrame, merge
    df1 = DataFrame({'key':[1,1], 'col1':[1,2],'col2':[3,4]})
    df2 = DataFrame({'key':[1,1], 'col3':[5,6]})
    
    merge(df1, df2,on='key')[['col1', 'col2', 'col3']]
    

    Output:

       col1  col2  col3
    0     1     3     5
    1     1     3     6
    2     2     4     5
    3     2     4     6
    

    See here for the documentation: http://pandas.pydata.org/pandas-docs/stable/merging.html