Search code examples
mysqldatediff

Calculate the average date difference


This is the essential setup of the table (only the DDL for relevant columns is present). MySQL version 8.0.15

The intent is to show an average of date difference interval between orders.

    CREATE TABLE final (
    prim_id INT(11) NOT NULL AUTO_INCREMENT,
    order_ID INT(11) NOT NULL,
    cust_ID VARCHAR(45) NOT NULL,
    created_at DATETIME NOT NULL,
    item_name VARCHAR(255) NOT NULL,
    cust_name VARCHAR(255) NOT NULL,
    PRIMARY KEY (prim_id),
    COLLATE='latin1_swedish_ci'
    ENGINE=InnoDB
    AUTO_INCREMENT=145699

Additional information:

cust ID -> cust_name (one-to-many)
cust_ID -> order_ID (one-to-many)
order ID -> item_name (one-to-many)
order ID -> created_at (one-to-one)
prim_id -> *everything* (one-to-many)

I've thought of using min(created_at) and max(created_at) but that will exclude all the orders between oldest and newest. I need a more refined solution.

The end result should be like this:

Information about average time intervals between all orders, (not just min and max because there are quite often times, more than two) measured in days, next to a column showing the client's name (cust_name).


Solution

  • If I get this right you might use a subquery getting the date of the previous order. Use datediff() to get the difference between the dates and avg() to get the average of that differences.

    SELECT f1.cust_id,
           avg(datediff(f1.created_at,
                        (SELECT f2.created_at
                                FROM final f2
                                WHERE f2.cust_id = f1.cust_id
                                      AND (f2.created_at < f1.created_at
                                            OR f2.created_at = f1.created_at
                                               AND f2.order_id < f1.order_id)
                                ORDER BY f2.created_at DESC,
                                         f2.order_id DESC
                                LIMIT 1)))
           FROM final f1
           GROUP BY f1.cust_id;
    

    Edit:

    If there can be more rows for one order ID, as KIKO Software mentioned we need to do the SELECT from the distinct set of orders like:

    SELECT f1.cust_id,
           avg(datediff(f1.created_at,
                        (SELECT f2.created_at
                                FROM (SELECT DISTINCT f3.cust_id,
                                                      f3.created_at,
                                                      f3.order_id
                                             FROM final f3) f2
                                WHERE f2.cust_id = f1.cust_id
                                      AND (f2.created_at < f1.created_at
                                            OR f2.created_at = f1.created_at
                                               AND f2.order_id < f1.order_id)
                                ORDER BY f2.created_at DESC,
                                         f2.order_id DESC
                                LIMIT 1)))
           FROM (SELECT DISTINCT f3.cust_id,
                                 f3.created_at,
                                 f3.order_id
                        FROM final f3) f1
           GROUP BY f1.cust_id;
    

    This may fail if there can be two rows for an order with different customer IDs or different creation time stamps. But in that case the data is just complete garbage and needs to be corrected before anything else.


    2nd Edit:

    Or alternatively getting the maximum creation timestamp per order if these can differ:

    SELECT f1.cust_id,
           avg(datediff(f1.created_at,
                        (SELECT f2.created_at
                                FROM (SELECT max(f3.cust_id) cust_id,
                                             max(f3.created_at) created_at,
                                             f3.order_id
                                             FROM final f3
                                             GROUP BY f3.order_id) f2
                                WHERE f2.cust_id = f1.cust_id
                                      AND (f2.created_at < f1.created_at
                                            OR f2.created_at = f1.created_at
                                               AND f2.order_id < f1.order_id)
                                ORDER BY f2.created_at DESC,
                                         f2.order_id DESC
                                LIMIT 1)))
           FROM (SELECT max(f3.cust_id) cust_id,
                        max(f3.created_at) created_at,
                        f3.order_id
                        FROM final f3
                        GROUP BY f3.order_id) f1
           GROUP BY f1.cust_id;