Search code examples
sqlpostgresqljoinleft-joininner-join

Postgresql - How to identify latest record of a combination of 2 columns (Example inside)


In PostgreSQL, if i have this kind of record :

|------------------|---------------------|------------------|
|  corona_country  |   corona_patient    | corona_suspected |
|------------------|---------------------|------------------|
|     Mexico       |        Teddy        |    2020-03-01    |
|------------------|---------------------|------------------|
|     Indonesia    |        Andy         |    2020-03-01    |
|------------------|---------------------|------------------|
|     Indonesia    |        Andy         |    2020-03-05    |
|------------------|---------------------|------------------|
|     China        |        Sarah        |    2020-03-04    |
|------------------|---------------------|------------------|
|     UK           |        Michael      |    2020-03-05    |
|------------------|---------------------|------------------|
|     Italy        |        Michael      |    2020-03-15    |
|------------------|---------------------|------------------|
|     Italy        |        Michael      |    2020-03-15    |
|------------------|---------------------|------------------|

I want to add extra column (by a Select query) that identifies if a combination of corona_country and corona_patient is having the latest date (of corona_suspected column) among its own records. So it'll look like this :

|------------------|---------------------|------------------|-----------|
|  corona_country  |   corona_patient    | corona_suspected |  LATEST?  |
|------------------|---------------------|------------------|-----------|
|     Mexico       |        Teddy        |    2020-03-01    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Indonesia    |        Andy         |    2020-03-01    |     N     |
|------------------|---------------------|------------------|-----------|
|     Indonesia    |        Andy         |    2020-03-05    |     Y     |
|------------------|---------------------|------------------|-----------|
|     China        |        Sarah        |    2020-03-04    |     Y     |
|------------------|---------------------|------------------|-----------|
|     UK           |        Michael      |    2020-03-05    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-13    |     N     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-15    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-15    |     Y     |
|------------------|---------------------|------------------|-----------|

How can i achieve this by sql query? Is there any method that joins another table that records latest dates of each combination of corona_country and corona_patient?


Solution

  • One option would be using MAX() OVER (PARTITION BY..) analytic function

    SELECT *,
           CASE 
           WHEN( MAX( corona_suspected ) OVER 
                    ( PARTITION BY corona_country, corona_patient ) = corona_suspected )
           THEN 
             'Y'
           ELSE
             'N'
           END AS latest
      FROM t
    

    OR alternatively

    Use DENSE_RANK() analytic function

       CASE 
       WHEN(DENSE_RANK() OVER 
       (PARTITION BY corona_country, corona_patient ORDER BY corona_suspected DESC )=1)
       THEN 
         'Y'
       ELSE
         'N'
       END AS latest
    

    in order to return the result with ties(more than one latest values for corona_suspected columns for each groups)

    Demo