I have a table that contains a customer_id, the product they purchased, the timestamp of the purchase, and the rank of the purchase order. I want to know if there is a way in SQL to get overall percentages for the different product combination ordering. Products can be purchased multiple times. Here are some example questions:
What percent of the time is Product B purchased after Product A? What percent of the time is Product C purchased after Product A? What percent of the time is Product C purchased after Product B?
I know I could write a bunch of case statements for the different permutations using PRODUCT
and RANK
columns but was curious if there was a more iterative type way in SQL. Also, I know it would be easier in a scripting language like Python but was curious about SQL.
CUST_ID PRODUCT DATE RANK
1000000 Product A 2019-07-30 1
1000000 Product B 2019-11-27 2
1000000 Product B 2020-08-15 3
1000000 Product C 2020-11-07 4
1000001 Product A 2019-05-21 1
1000001 Product B 2019-08-26 2
1000001 Product C 2020-06-26 3
1000002 Product A 2020-06-28 1
1000002 Product A 2020-10-30 2
1000002 Product B 2020-11-23 3
1000003 Product C 2021-07-03 1
1000004 Product C 2021-07-04 2
You can use SQL self-joins to compute these percentages. Here's a query to answer your example questions:
WITH combinations AS (
SELECT
t1.product AS product1,
t2.product AS product2,
COUNT(*) AS num_combinations
FROM
your_table t1
INNER JOIN
your_table t2 ON t1.cust_id = t2.cust_id AND t1.rank + 1 = t2.rank
GROUP BY
t1.product,
t2.product
),
total_combinations AS (
SELECT
SUM(num_combinations) AS total
FROM
combinations
)
SELECT
product1,
product2,
num_combinations,
num_combinations * 100.0 / total AS percentage
FROM
combinations,
total_combinations;
Replace your_table with the actual table name in your database. This query first computes the count of each product combination (e.g., Product A followed by Product B, Product A followed by Product C, etc.) and then calculates the overall percentage for each combination. The percentages should answer the questions you provided, such as "What percent of the time is Product B purchased after Product A?".