Search code examples
sqlmarket-basket-analysis

Doing a market basket analysis on the order details


I have a table that looks (abbreviated) like:

| order_id  | item_id   | amount    | qty   | date          |
|---------- |---------  |--------   |-----  |------------   |
| 1         | 1         | 10        | 1     | 10-10-2014    |
| 1         | 2         | 20        | 2     | 10-10-2014    |
| 2         | 1         | 10        | 1     | 10-12-2014    |
| 2         | 2         | 20        | 1     | 10-12-2014    |
| 2         | 3         | 45        | 1     | 10-12-2014    |
| 3         | 1         | 10        | 1     | 9-9-2014      |
| 3         | 3         | 45        | 1     | 9-9-2014      |
| 4         | 2         | 20        | 1     | 11-11-2014    |

I would like to run a query that would calculate the list of items that most frequently occur together.

In this case the result would be:

|items|frequency|
|-----|---------|
|1,2, |2        |
|1,3  |1        |
|2,3  |1        |
|2    |1        |

Ideally, first presenting orders with more than one items, then presenting the most frequently ordered single items.

Could anyone please provide an example for how to structure this SQL?


Solution

  • This query generate all of the requested output, in the cases where 2 items occur together. It doesn't include the last item of the requested output since a single value (2) technically doesn't occur together with anything... although you could easily add a UNION query to include values that happen alone.

    This is written for PostgreSQL 9.3

     create table orders(
            order_id int, 
            item_id int, 
            amount int, 
            qty int, 
            date timestamp
    
    
    );
    
    INSERT INTO ORDERS VALUES(1,1,10,1,'10-10-2014');
    INSERT INTO ORDERS VALUES(1,2,20,1,'10-10-2014');
    INSERT INTO ORDERS VALUES(2,1,10,1,'10-12-2014');
    INSERT INTO ORDERS VALUES(2,2,20,1,'10-12-2014');
    INSERT INTO ORDERS VALUES(2,3,45,1,'10-12-2014');
    INSERT INTO ORDERS VALUES(3,1,10,1,'9-9-2014');
    INSERT INTO ORDERS VALUES(3,3,45,1,'9-9-2014');
    INSERT INTO ORDERS VALUES(4,2,10,1,'11-11-2014');
    
    with order_pairs as (
        select (pg1.item_id, pg2.item_id) as items, pg1.date
        from 
        (select distinct item_id, date
        from orders) as pg1
        join
        (select distinct item_id, date
        from orders) as pg2
        ON 
        (
        pg1.date = pg2.date AND
        pg1.item_id != pg2.item_id AND
        pg1.item_id < pg2.item_id
    
        )
        )
    
        SELECT items, count(*) as frequency
        FROM order_pairs
        GROUP by items
        ORDER by items;
    

    output

     items | frequency 
    -------+-----------
     (1,2) |         2
     (1,3) |         2
     (2,3) |         1
    (3 rows)