Search code examples
sqlsoql

SQL: The second oldest date


Imagine you've got a table similar to this:

 |email          |   purchase_date      |
 |:--------------|:---------------------|
 |stan@gmail.com |  Jun 30 2020 12:00AM |  
 |stan@gmail.com |  Aug 05 2020 5:00PM  |  
 |stan@gmail.com |  Mar 22 2018 3:00AM  |  
 |eric@yahoo.com |  Aug 05 2020 5:00PM  |  
 |eric@yahoo.com |  Mar 22 2018 3:00PM  |  
 |kyle@gmail.com |  Mar 22 2018 3:00PM  |  
 |kyle@gmail.com |  Jun 30 2020 12:00AM |  
 |kyle@gmail.com |  Aug 05 2020 5:00PM  |  
 |kenny@gmail.com|  Aug 05 2020 5:00PM  |

Totally random. The actual database I work with is actually more complex with much more columns.

Both the columns are STRING type. Which is not convenient. The purchase date should be DATE type. Kenny made only one purchase, so there shouldn't be any row for him in the result table. Also notice that a there's a lot of identical dates.

I'd like to select the email and the 2nd oldest purchase date (named as 'second_purchase') for each email address, so that the result looks like this:

|email          | second_purchase      |
|:--------------|:-------------------- |
|stan@gmail.com | Jun 30 2020 12:00AM  | 
|eric@yahoo.com | Aug 05 2021 5:00PM   | 
|kyle@gmail.com | Jun 30 2020 12:00AM  | 

I can't seem to get the logic or syntax right. I don't want to put all my codes in here, because I've tried many variations of my idea... It didn't seem to work somehow. But I'd love to see an example code from someone skilled in SQL. My idea is maybe not that great..:-)

This version is actually SOQL (Salesforce Object Query Language). That could be important.

Sorry for not styling the table properly, I didn't seem to work either, even when I used the recommended styling. I wasn't able to post. That was actually quite frustrating.

Anyway, thank you for any help!


Solution

  • You could try the following sql which uses a dense_rank over each user's email and orders by a casted purchase_date

    Query #1

    WITH date_converted_table AS (
        SELECT
            email,
            purchase_date,
            DENSE_RANK() OVER (
              PARTITION BY email
              ORDER BY CAST(purchase_date as timestamp) ASC
            ) dr
        FROM
            mytable
    )
    SELECT
        email,
        purchase_date as second_purchase
    FROM 
        date_converted_table
    WHERE dr=2;
    
    email second_purchase
    eric@yahoo.com Aug 05 2020 5:00PM
    kyle@gmail.com Jun 30 2020 12:00AM
    stan@gmail.com Jun 30 2020 12:00AM

    Query #2

    SELECT
        email,
        purchase_date as second_purchase
    FROM (
        SELECT
            email,
            purchase_date,
            DENSE_RANK() OVER (
              PARTITION BY email
              ORDER BY CAST(purchase_date as timestamp) ASC
            ) dr
        FROM
            mytable
    ) tb
    WHERE dr=2;
    
    email second_purchase
    eric@yahoo.com Aug 05 2020 5:00PM
    kyle@gmail.com Jun 30 2020 12:00AM
    stan@gmail.com Jun 30 2020 12:00AM

    View on DB Fiddle

    Update 1

    As it pertains to follow up question in comment:

    Is it possible to upgrade the result so that there are first_purchase dates (where dr=1) adn second_purchase dates (where dr=2) in separate columns?

    A case expression and aggregation may assist you as shown below. The having clause ensures that there is both a first and second purchase date.

    SELECT
        email,
        MAX(CASE
            WHEN dr=1 THEN purchase_date
        END) as first_purchase,
        MAX(CASE
            WHEN dr=2 THEN purchase_date
        END) as second_purchase
    FROM (
        SELECT
            email,
            purchase_date,
            DENSE_RANK() OVER (
              PARTITION BY email
              ORDER BY CAST(purchase_date as timestamp) ASC
            ) dr
        FROM
            mytable
    ) tb
    GROUP BY email
    HAVING
        SUM(
            CASE WHEN dr=1 THEN 1 ELSE 0 END  
        ) > 0 AND
         SUM(
            CASE WHEN dr=2 THEN 1 ELSE 0 END  
        ) > 0;
    
    email first_purchase second_purchase
    eric@yahoo.com Mar 22 2018 3:00PM Aug 05 2020 5:00PM
    kyle@gmail.com Mar 22 2018 3:00PM Jun 30 2020 12:00AM
    stan@gmail.com Mar 22 2018 3:00AM Jun 30 2020 12:00AM

    View on DB Fiddle

    Let me know if this works for you.