In PostgreSQL, if i have this kind of record :
|------------------|---------------------|------------------|
| corona_country | corona_patient | corona_suspected |
|------------------|---------------------|------------------|
| Mexico | Teddy | 2020-03-01 |
|------------------|---------------------|------------------|
| Indonesia | Andy | 2020-03-01 |
|------------------|---------------------|------------------|
| Indonesia | Andy | 2020-03-05 |
|------------------|---------------------|------------------|
| China | Sarah | 2020-03-04 |
|------------------|---------------------|------------------|
| UK | Michael | 2020-03-05 |
|------------------|---------------------|------------------|
| Italy | Michael | 2020-03-15 |
|------------------|---------------------|------------------|
| Italy | Michael | 2020-03-15 |
|------------------|---------------------|------------------|
I want to add extra column (by a Select query) that identifies if a combination of corona_country and corona_patient is having the latest date (of corona_suspected column) among its own records. So it'll look like this :
|------------------|---------------------|------------------|-----------|
| corona_country | corona_patient | corona_suspected | LATEST? |
|------------------|---------------------|------------------|-----------|
| Mexico | Teddy | 2020-03-01 | Y |
|------------------|---------------------|------------------|-----------|
| Indonesia | Andy | 2020-03-01 | N |
|------------------|---------------------|------------------|-----------|
| Indonesia | Andy | 2020-03-05 | Y |
|------------------|---------------------|------------------|-----------|
| China | Sarah | 2020-03-04 | Y |
|------------------|---------------------|------------------|-----------|
| UK | Michael | 2020-03-05 | Y |
|------------------|---------------------|------------------|-----------|
| Italy | Michael | 2020-03-13 | N |
|------------------|---------------------|------------------|-----------|
| Italy | Michael | 2020-03-15 | Y |
|------------------|---------------------|------------------|-----------|
| Italy | Michael | 2020-03-15 | Y |
|------------------|---------------------|------------------|-----------|
How can i achieve this by sql query? Is there any method that joins another table that records latest dates of each combination of corona_country and corona_patient?
One option would be using MAX() OVER (PARTITION BY..)
analytic function
SELECT *,
CASE
WHEN( MAX( corona_suspected ) OVER
( PARTITION BY corona_country, corona_patient ) = corona_suspected )
THEN
'Y'
ELSE
'N'
END AS latest
FROM t
OR alternatively
Use DENSE_RANK()
analytic function
CASE
WHEN(DENSE_RANK() OVER
(PARTITION BY corona_country, corona_patient ORDER BY corona_suspected DESC )=1)
THEN
'Y'
ELSE
'N'
END AS latest
in order to return the result with ties(more than one latest values for corona_suspected
columns for each groups)