Search code examples
mysqlsqldatetimemariadbwindow-functions

Calculating average time between dates in SQL


Using MySQL, I'm trying to figure out how to answer the question: What is the average number of months between users creating their Nth project?

Expected result:

| project count | Average # months |
| 1             | 0                | # On average, it took 0 months to create the first project (nothing to compare to)
| 2             | 12               | # On average, it takes a user 12 months to create their second project
| 3             | 3                | # On average, it takes a user 3 months to create their third project

My MySQL table represents projects created by users. The table can be summarized as:

| user_id | project created at |
|---------|--------------------|
| 1       | Jan 1, 2020 1:00 pm|
| 1       | Feb 2, 2020 3:45 am|
| 1       | Nov 6, 2020 0:01 am|
| 1       | Mar 4, 2021 5:01 pm|
|------------------------------|
| 2       | Another timestamp  |
| 2       | Another timestamp  |
| 2       | Another timestamp  |
| 2       | Another timestamp  |
| 2       | Another timestamp  |
| 2       | Another timestamp  |
|------------------------------|
| ...     | Another timestamp  |
| ...     | Another timestamp  |

Some users will have one project while some may have hundreds.

Edit: Current Implementation

with
    paid_self_serve_projects_presentation as (
        select 
                `Paid Projects`.owner_email
            `Owner Email`, 
                row_number() over (partition by `Paid Projects`.owner_uuid order by created_at)
            `Project Count`,
                day(`Paid Projects`.created_at)
            `Created Day`,
                month(`Paid Projects`.created_at)
            `Created Month`,
                year(`Paid Projects`.created_at)
            `Created Year`,
                `Paid Projects`.created_at
            `Created`
        from self_service_paid_projects as `Paid Projects`
        order by `Paid Projects`.owner_uuid, `Paid Projects`.created_at
    )
    
select `Projects`.* from paid_self_serve_projects_presentation as `Projects`

Solution

  • You can use window functions. I am thinking row_number() to enumerate the projects of each user ordered by creation date, and lag() to get the date when the previous project was created:

    select rn, avg(datediff(created_at, lag_created_at)) avg_diff_days
    from (
        select t.*,
            row_number() over(partition by user_id order by created_at) rn,
            lag(created_at, 1, created_at) over(partition by user_id order by created_at) lag_created_at
        from mytable t
    ) t
    group by rn
    

    This gives you the average difference in days, which is somehow more accurates that months. If you really want months, then use timestampdiff(month, lag_created_at, created_at) instead of datediff() - but be aware that the function returns an integer value, hence there is a loss of precision.