SQL: The second oldest date

Imagine you've got a table similar to this:

 |email          |   purchase_date      |
 |:--------------|:---------------------|
 |stan@gmail.com |  Jun 30 2020 12:00AM |  
 |stan@gmail.com |  Aug 05 2020 5:00PM  |  
 |stan@gmail.com |  Mar 22 2018 3:00AM  |  
 |eric@yahoo.com |  Aug 05 2020 5:00PM  |  
 |eric@yahoo.com |  Mar 22 2018 3:00PM  |  
 |kyle@gmail.com |  Mar 22 2018 3:00PM  |  
 |kyle@gmail.com |  Jun 30 2020 12:00AM |  
 |kyle@gmail.com |  Aug 05 2020 5:00PM  |  
 |kenny@gmail.com|  Aug 05 2020 5:00PM  |

Totally random. The actual database I work with is actually more complex with much more columns.

Both the columns are STRING type. Which is not convenient. The purchase date should be DATE type. Kenny made only one purchase, so there shouldn't be any row for him in the result table. Also notice that a there's a lot of identical dates.

I'd like to select the email and the 2nd oldest purchase date (named as 'second_purchase') for each email address, so that the result looks like this:

|email          | second_purchase      |
|:--------------|:-------------------- |
|stan@gmail.com | Jun 30 2020 12:00AM  | 
|eric@yahoo.com | Aug 05 2021 5:00PM   | 
|kyle@gmail.com | Jun 30 2020 12:00AM  |

I can't seem to get the logic or syntax right. I don't want to put all my codes in here, because I've tried many variations of my idea... It didn't seem to work somehow. But I'd love to see an example code from someone skilled in SQL. My idea is maybe not that great..:-)

This version is actually SOQL (Salesforce Object Query Language). That could be important.

Sorry for not styling the table properly, I didn't seem to work either, even when I used the recommended styling. I wasn't able to post. That was actually quite frustrating.

Anyway, thank you for any help!

Solution

You could try the following sql which uses a dense_rank over each user's email and orders by a casted purchase_date

Query #1

WITH date_converted_table AS (
    SELECT
        email,
        purchase_date,
        DENSE_RANK() OVER (
          PARTITION BY email
          ORDER BY CAST(purchase_date as timestamp) ASC
        ) dr
    FROM
        mytable
)
SELECT
    email,
    purchase_date as second_purchase
FROM 
    date_converted_table
WHERE dr=2;

email	second_purchase
eric@yahoo.com	Aug 05 2020 5:00PM
kyle@gmail.com	Jun 30 2020 12:00AM
stan@gmail.com	Jun 30 2020 12:00AM

Query #2

SELECT
    email,
    purchase_date as second_purchase
FROM (
    SELECT
        email,
        purchase_date,
        DENSE_RANK() OVER (
          PARTITION BY email
          ORDER BY CAST(purchase_date as timestamp) ASC
        ) dr
    FROM
        mytable
) tb
WHERE dr=2;

email	second_purchase
eric@yahoo.com	Aug 05 2020 5:00PM
kyle@gmail.com	Jun 30 2020 12:00AM
stan@gmail.com	Jun 30 2020 12:00AM

View on DB Fiddle

Update 1

As it pertains to follow up question in comment:

Is it possible to upgrade the result so that there are first_purchase dates (where dr=1) adn second_purchase dates (where dr=2) in separate columns?

A case expression and aggregation may assist you as shown below. The having clause ensures that there is both a first and second purchase date.

SELECT
    email,
    MAX(CASE
        WHEN dr=1 THEN purchase_date
    END) as first_purchase,
    MAX(CASE
        WHEN dr=2 THEN purchase_date
    END) as second_purchase
FROM (
    SELECT
        email,
        purchase_date,
        DENSE_RANK() OVER (
          PARTITION BY email
          ORDER BY CAST(purchase_date as timestamp) ASC
        ) dr
    FROM
        mytable
) tb
GROUP BY email
HAVING
    SUM(
        CASE WHEN dr=1 THEN 1 ELSE 0 END  
    ) > 0 AND
     SUM(
        CASE WHEN dr=2 THEN 1 ELSE 0 END  
    ) > 0;

email	first_purchase	second_purchase
eric@yahoo.com	Mar 22 2018 3:00PM	Aug 05 2020 5:00PM
kyle@gmail.com	Mar 22 2018 3:00PM	Jun 30 2020 12:00AM
stan@gmail.com	Mar 22 2018 3:00AM	Jun 30 2020 12:00AM

View on DB Fiddle

Let me know if this works for you.