Query Value by Max Date in Postgresql

I already asked this question here but there contained less information about my question. So, I create a new question with more information.

Here is the sample table that I have. Each row contains the filled data by the user at every time. So that the timestamp column will not be null through the whole table. There may be unrecorded value under item, if the user didn't fill. The id is the auto-generated column for each record.

CREATE TABLE tbl (id int, customer_id text, item text, value text, timestamp timestamp);    
INSERT INTO tbl VALUES
(1, '001', 'price', '1000', '2021-11-01 01:00:00'),
(2, '001', 'price', '1500', '2021-11-02 01:00:00'),
(3, '001', 'price', '1400', '2021-11-03 01:00:00'),
(4, '001', 'condition', 'good', '2021-11-01 01:00:00'),
(5, '001', 'condition', 'good', '2021-11-02 01:00:00'),
(6, '001', 'condition', 'ok', '2021-11-03 01:00:00'),
(7, '001', 'feeling', 'sad', '2021-11-01 01:00:00'),
(8, '001', 'feeling', 'angry', '2021-11-02 01:00:00'),
(9, '001', 'feeling', 'fine', '2021-11-03 01:00:00'),
(10, '002', 'price', '1200', '2021-11-01 01:00:00'),
(11, '002', 'price', '1600', '2021-11-02 01:00:00'),
(12, '002', 'price', '2000', '2021-11-03 01:00:00'),
(13, '002', 'weather', 'sunny', '2021-11-01 01:00:00'),
(14, '002', 'weather', 'rain', '2021-11-02 01:00:00'),
(15, '002', 'price', '1900', '2021-11-04 01:00:00'),
(16, '002', 'feeling', 'sad', '2021-11-01 01:00:00'),
(17, '002', 'feeling', 'angry', '2021-11-02 01:00:00'),
(18, '002', 'feeling', 'fine', '2021-11-03 01:00:00'),
(19, '003', 'price', '1000', '2021-11-01 01:00:00'),
(20, '003', 'price', '1500', '2021-11-02 01:00:00'),
(21, '003', 'price', '2000', '2021-11-03 01:00:00'),
(22, '003', 'condition', 'ok', '2021-11-01 01:00:00'),
(23, '003', 'weather', 'rain', '2021-11-02 01:00:00'),
(24, '003', 'condition', 'bad', '2021-11-03 01:00:00'),
(25, '003', 'feeling', 'fine', '2021-11-01 01:00:00'),
(26, '003', 'weather', 'sunny', '2021-11-03 01:00:00'),
(27, '003', 'feeling', 'sad', '2021-11-03 01:00:00')
;

To see clearly, I order the above table by id and timestamp. It doesn't matter.

We are using Postgresql Version: PostgreSQL 9.5.19
The actual table contains over 4 million rows
The item column contains over 500 distinct items, but don't worry. I will use 10 items at most for a query. In the above table, I used only 4 items.
We also have another table called Customer_table with a unique Customer_id containing customers' general information.

From the above table, I want to query the data to create a table with the latest date updated data as below. I will use 10 items at most for a query so that there may be 10 columns.

customer_id  price  condition  feeling   weather .......(there may be other columns from item column)
   002        1900    null      fine      rain
   001        1400     ok       fine      null
   003        2000    bad       sad       sunny

This is the query that I get from previous questions, but I asked only for two item.

SELECT customer_id, p.value AS price, c.value AS condition
FROM  (
   SELECT DISTINCT ON (customer_id)
          customer_id, value
   FROM   tbl
   WHERE  item = 'condition'
   ORDER  BY customer_id, timestamp DESC
   ) c
FULL JOIN (
   SELECT DISTINCT ON (customer_id)
          customer_id, value
   FROM   tbl
   WHERE  item = 'price'
   ORDER  BY customer_id, timestamp DESC
   ) p USING (customer_id)

So, if there is any better solution please help me. Thank you.

Solution

You may try other approaches using row_number to generate a value to filter your data on the most recent data. You may then aggregate on customer id with the max value for a case expression filtering your records based on the desired row number rn=1 (we will order by descending) and item name.

These approaches are less verbose and based on the results online seem to be more performant. Let me know how replicating this in your environment works in the comments.

You may use EXPLAIN ANALYZE to compare this approach to the current one. The results in the online environment provided:

Current Approach

| Planning time: 0.129 ms                                                                                                      
| Execution time: 0.056 ms

Suggested Approach 1

| Planning time: 0.061 ms                                                                                                 
| Execution time: 0.070 ms

Suggested Approach 2

| Planning time: 0.047 ms                                                                                                 
| Execution time: 0.056 ms

NB. You may use EXPLAIN ANALYZE to compare these approaches in your environment which we cannot replicate online. The results may also vary on each run. Indexes and early filters on the item column are also recommended to improve performance.

Schema (PostgreSQL v9.5)

Suggested Approach 1

SELECT
    t1.customer_id,
    MAX(CASE WHEN t1.item='condition' THEN t1.value END) as conditio,
    MAX(CASE WHEN t1.item='price' THEN t1.value END) as price,
    MAX(CASE WHEN t1.item='feeling' THEN t1.value END) as feeling,
    MAX(CASE WHEN t1.item='weather' THEN t1.value END) as weather
FROM (
    SELECT
        * ,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id,item
            ORDER BY tbl.timestamp DESC
        ) as rn
    FROM
        tbl 
    -- ensure that you filter based on your desired items
    -- indexes on item column are recommended to improve performance
) t1
WHERE rn=1
GROUP BY
   1;

customer_id	conditio	price	feeling	weather
001	ok	1400	fine
002		1900	fine	rain
003	bad	2000	sad	sunny

Suggested Approach 2

SELECT
    t1.customer_id,
    MAX(t1.value) FILTER (WHERE  t1.item='condition')  as conditio,
    MAX(t1.value) FILTER (WHERE  t1.item='price')  as price,
    MAX(t1.value) FILTER (WHERE  t1.item='feeling')  as feeling,
    MAX(t1.value) FILTER (WHERE  t1.item='weather')  as weather
    
FROM (
    SELECT
        * ,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id,item
            ORDER BY tbl.timestamp DESC
        ) as rn
    FROM
        tbl 
    -- ensure that you filter based on your desired items
    -- indexes on item column are recommended to improve performance
) t1
WHERE rn=1
GROUP BY
   1;

customer_id	conditio	price	feeling	weather
001	ok	1400	fine
002		1900	fine	rain
003	bad	2000	sad	sunny

Current Approach with EXPLAIN ANALYZE

EXPLAIN(ANALYZE,BUFFERS)
SELECT customer_id, p.value AS price, c.value AS condition
FROM  (
   SELECT DISTINCT ON (customer_id)
          customer_id, value
   FROM   tbl
   WHERE  item = 'condition'
   ORDER  BY customer_id, timestamp DESC
   ) c
FULL JOIN (
   SELECT DISTINCT ON (customer_id)
          customer_id, value
   FROM   tbl
   WHERE  item = 'price'
   ORDER  BY customer_id, timestamp DESC
   ) p USING (customer_id);

QUERY PLAN
Merge Full Join (cost=35.05..35.12 rows=1 width=128) (actual time=0.025..0.030 rows=3 loops=1)
Merge Cond: (tbl.customer_id = tbl_1.customer_id)
Buffers: shared hit=2
-> Unique (cost=17.52..17.54 rows=1 width=72) (actual time=0.013..0.014 rows=2 loops=1)
Buffers: shared hit=1
-> Sort (cost=17.52..17.53 rows=3 width=72) (actual time=0.013..0.013 rows=5 loops=1)
Sort Key: tbl.customer_id, tbl."timestamp" DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1
-> Seq Scan on tbl (cost=0.00..17.50 rows=3 width=72) (actual time=0.004..0.006 rows=5 loops=1)
Filter: (item = 'condition'::text)
Rows Removed by Filter: 22
Buffers: shared hit=1
-> Materialize (cost=17.52..17.55 rows=1 width=64) (actual time=0.010..0.013 rows=3 loops=1)
Buffers: shared hit=1
-> Unique (cost=17.52..17.54 rows=1 width=72) (actual time=0.010..0.012 rows=3 loops=1)
Buffers: shared hit=1
-> Sort (cost=17.52..17.53 rows=3 width=72) (actual time=0.010..0.010 rows=10 loops=1)
Sort Key: tbl_1.customer_id, tbl_1."timestamp" DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1
-> Seq Scan on tbl tbl_1 (cost=0.00..17.50 rows=3 width=72) (actual time=0.001..0.003 rows=10 loops=1)
Filter: (item = 'price'::text)
Rows Removed by Filter: 17
Buffers: shared hit=1
Planning time: 0.129 ms
Execution time: 0.056 ms

Suggested Approach 1 with EXPLAIN ANALYZE

EXPLAIN(ANALYZE,BUFFERS)
SELECT
    t1.customer_id,
    MAX(CASE WHEN t1.item='price' THEN t1.value END) as price,
    MAX(CASE WHEN t1.item='condition' THEN t1.value END) as conditio
    
FROM (
    SELECT
        * ,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id,item
            ORDER BY tbl.timestamp DESC
        ) as rn
    FROM
        tbl 
    where item IN ('price','condition')
) t1
WHERE rn=1
GROUP BY
   1;

QUERY PLAN
GroupAggregate (cost=17.58..17.81 rows=1 width=96) (actual time=0.039..0.047 rows=3 loops=1)
Group Key: t1.customer_id
Buffers: shared hit=1
-> Subquery Scan on t1 (cost=17.58..17.79 rows=1 width=96) (actual time=0.030..0.040 rows=5 loops=1)
Filter: (t1.rn = 1)
Rows Removed by Filter: 10
Buffers: shared hit=1
-> WindowAgg (cost=17.58..17.71 rows=6 width=104) (actual time=0.029..0.038 rows=15 loops=1)
Buffers: shared hit=1
-> Sort (cost=17.58..17.59 rows=6 width=104) (actual time=0.028..0.030 rows=15 loops=1)
Sort Key: tbl.customer_id, tbl.item, tbl."timestamp" DESC
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=1
-> Seq Scan on tbl (cost=0.00..17.50 rows=6 width=104) (actual time=0.003..0.008 rows=15 loops=1)
Filter: (item = ANY ('{price,condition}'::text[]))
Rows Removed by Filter: 12
Buffers: shared hit=1
Planning time: 0.061 ms
Execution time: 0.070 ms

Suggested Approach 2 with EXPLAIN ANALYZE

EXPLAIN(ANALYZE,BUFFERS)
SELECT
    t1.customer_id,
    MAX(t1.value) FILTER (WHERE  t1.item='price')  as price,
    MAX(t1.value) FILTER (WHERE  t1.item='condition')  as conditio
    
FROM (
    SELECT
        * ,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id,item
            ORDER BY tbl.timestamp DESC
        ) as rn
    FROM
        tbl 
    where item IN ('price','condition')
) t1
WHERE rn=1
GROUP BY
   1;

QUERY PLAN
GroupAggregate (cost=17.58..17.81 rows=1 width=96) (actual time=0.029..0.037 rows=3 loops=1)
Group Key: t1.customer_id
Buffers: shared hit=1
-> Subquery Scan on t1 (cost=17.58..17.79 rows=1 width=96) (actual time=0.021..0.032 rows=5 loops=1)
Filter: (t1.rn = 1)
Rows Removed by Filter: 10
Buffers: shared hit=1
-> WindowAgg (cost=17.58..17.71 rows=6 width=104) (actual time=0.021..0.030 rows=15 loops=1)
Buffers: shared hit=1
-> Sort (cost=17.58..17.59 rows=6 width=104) (actual time=0.019..0.021 rows=15 loops=1)
Sort Key: tbl.customer_id, tbl.item, tbl."timestamp" DESC
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=1
-> Seq Scan on tbl (cost=0.00..17.50 rows=6 width=104) (actual time=0.003..0.008 rows=15 loops=1)
Filter: (item = ANY ('{price,condition}'::text[]))
Rows Removed by Filter: 12
Buffers: shared hit=1
Planning time: 0.047 ms
Execution time: 0.056 ms

View working demo on DB Fiddle