Imagine you've got a table similar to this:
|email | purchase_date |
|:--------------|:---------------------|
|stan@gmail.com | Jun 30 2020 12:00AM |
|stan@gmail.com | Aug 05 2020 5:00PM |
|stan@gmail.com | Mar 22 2018 3:00AM |
|eric@yahoo.com | Aug 05 2020 5:00PM |
|eric@yahoo.com | Mar 22 2018 3:00PM |
|kyle@gmail.com | Mar 22 2018 3:00PM |
|kyle@gmail.com | Jun 30 2020 12:00AM |
|kyle@gmail.com | Aug 05 2020 5:00PM |
|kenny@gmail.com| Aug 05 2020 5:00PM |
Totally random. The actual database I work with is actually more complex with much more columns.
Both the columns are STRING type. Which is not convenient. The purchase date should be DATE type. Kenny made only one purchase, so there shouldn't be any row for him in the result table. Also notice that a there's a lot of identical dates.
I'd like to select the email and the 2nd oldest purchase date (named as 'second_purchase') for each email address, so that the result looks like this:
|email | second_purchase |
|:--------------|:-------------------- |
|stan@gmail.com | Jun 30 2020 12:00AM |
|eric@yahoo.com | Aug 05 2021 5:00PM |
|kyle@gmail.com | Jun 30 2020 12:00AM |
I can't seem to get the logic or syntax right. I don't want to put all my codes in here, because I've tried many variations of my idea... It didn't seem to work somehow. But I'd love to see an example code from someone skilled in SQL. My idea is maybe not that great..:-)
This version is actually SOQL (Salesforce Object Query Language). That could be important.
Sorry for not styling the table properly, I didn't seem to work either, even when I used the recommended styling. I wasn't able to post. That was actually quite frustrating.
Anyway, thank you for any help!
You could try the following sql which uses a dense_rank
over each user's email and orders by a casted purchase_date
Query #1
WITH date_converted_table AS (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
)
SELECT
email,
purchase_date as second_purchase
FROM
date_converted_table
WHERE dr=2;
second_purchase | |
---|---|
eric@yahoo.com | Aug 05 2020 5:00PM |
kyle@gmail.com | Jun 30 2020 12:00AM |
stan@gmail.com | Jun 30 2020 12:00AM |
Query #2
SELECT
email,
purchase_date as second_purchase
FROM (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
) tb
WHERE dr=2;
second_purchase | |
---|---|
eric@yahoo.com | Aug 05 2020 5:00PM |
kyle@gmail.com | Jun 30 2020 12:00AM |
stan@gmail.com | Jun 30 2020 12:00AM |
As it pertains to follow up question in comment:
Is it possible to upgrade the result so that there are first_purchase dates (where dr=1) adn second_purchase dates (where dr=2) in separate columns?
A case expression and aggregation may assist you as shown below. The having clause ensures that there is both a first and second purchase date.
SELECT
email,
MAX(CASE
WHEN dr=1 THEN purchase_date
END) as first_purchase,
MAX(CASE
WHEN dr=2 THEN purchase_date
END) as second_purchase
FROM (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
) tb
GROUP BY email
HAVING
SUM(
CASE WHEN dr=1 THEN 1 ELSE 0 END
) > 0 AND
SUM(
CASE WHEN dr=2 THEN 1 ELSE 0 END
) > 0;
first_purchase | second_purchase | |
---|---|---|
eric@yahoo.com | Mar 22 2018 3:00PM | Aug 05 2020 5:00PM |
kyle@gmail.com | Mar 22 2018 3:00PM | Jun 30 2020 12:00AM |
stan@gmail.com | Mar 22 2018 3:00AM | Jun 30 2020 12:00AM |
Let me know if this works for you.