Search code examples
sqlmysqlpostgresqlrank

How to get iterate between all names and rank permutations in SQL


I have a table that contains a customer_id, the product they purchased, the timestamp of the purchase, and the rank of the purchase order. I want to know if there is a way in SQL to get overall percentages for the different product combination ordering. Products can be purchased multiple times. Here are some example questions:

What percent of the time is Product B purchased after Product A? What percent of the time is Product C purchased after Product A? What percent of the time is Product C purchased after Product B?

I know I could write a bunch of case statements for the different permutations using PRODUCT and RANK columns but was curious if there was a more iterative type way in SQL. Also, I know it would be easier in a scripting language like Python but was curious about SQL.

CUST_ID PRODUCT     DATE        RANK
1000000 Product A   2019-07-30  1
1000000 Product B   2019-11-27  2
1000000 Product B   2020-08-15  3
1000000 Product C   2020-11-07  4
1000001 Product A   2019-05-21  1
1000001 Product B   2019-08-26  2
1000001 Product C   2020-06-26  3
1000002 Product A   2020-06-28  1
1000002 Product A   2020-10-30  2
1000002 Product B   2020-11-23  3
1000003 Product C   2021-07-03  1
1000004 Product C   2021-07-04  2

Solution

  • You can use SQL self-joins to compute these percentages. Here's a query to answer your example questions:

    WITH combinations AS (
    SELECT 
        t1.product AS product1, 
        t2.product AS product2, 
        COUNT(*) AS num_combinations
    FROM 
        your_table t1 
    INNER JOIN 
        your_table t2 ON t1.cust_id = t2.cust_id AND t1.rank + 1 = t2.rank
    GROUP BY 
        t1.product, 
        t2.product
    ),
    total_combinations AS (
    SELECT 
        SUM(num_combinations) AS total
    FROM 
        combinations
    )
    SELECT 
        product1, 
        product2, 
        num_combinations, 
        num_combinations * 100.0 / total AS percentage
    FROM 
        combinations, 
        total_combinations;
    

    Replace your_table with the actual table name in your database. This query first computes the count of each product combination (e.g., Product A followed by Product B, Product A followed by Product C, etc.) and then calculates the overall percentage for each combination. The percentages should answer the questions you provided, such as "What percent of the time is Product B purchased after Product A?".