I already asked this question here but there contained less information about my question. So, I create a new question with more information.
Here is the sample table that I have. Each row contains the filled data by the user at every time. So that the timestamp column will not be null through the whole table. There may be unrecorded value under item, if the user didn't fill. The id is the auto-generated column for each record.
CREATE TABLE tbl (id int, customer_id text, item text, value text, timestamp timestamp);
INSERT INTO tbl VALUES
(1, '001', 'price', '1000', '2021-11-01 01:00:00'),
(2, '001', 'price', '1500', '2021-11-02 01:00:00'),
(3, '001', 'price', '1400', '2021-11-03 01:00:00'),
(4, '001', 'condition', 'good', '2021-11-01 01:00:00'),
(5, '001', 'condition', 'good', '2021-11-02 01:00:00'),
(6, '001', 'condition', 'ok', '2021-11-03 01:00:00'),
(7, '001', 'feeling', 'sad', '2021-11-01 01:00:00'),
(8, '001', 'feeling', 'angry', '2021-11-02 01:00:00'),
(9, '001', 'feeling', 'fine', '2021-11-03 01:00:00'),
(10, '002', 'price', '1200', '2021-11-01 01:00:00'),
(11, '002', 'price', '1600', '2021-11-02 01:00:00'),
(12, '002', 'price', '2000', '2021-11-03 01:00:00'),
(13, '002', 'weather', 'sunny', '2021-11-01 01:00:00'),
(14, '002', 'weather', 'rain', '2021-11-02 01:00:00'),
(15, '002', 'price', '1900', '2021-11-04 01:00:00'),
(16, '002', 'feeling', 'sad', '2021-11-01 01:00:00'),
(17, '002', 'feeling', 'angry', '2021-11-02 01:00:00'),
(18, '002', 'feeling', 'fine', '2021-11-03 01:00:00'),
(19, '003', 'price', '1000', '2021-11-01 01:00:00'),
(20, '003', 'price', '1500', '2021-11-02 01:00:00'),
(21, '003', 'price', '2000', '2021-11-03 01:00:00'),
(22, '003', 'condition', 'ok', '2021-11-01 01:00:00'),
(23, '003', 'weather', 'rain', '2021-11-02 01:00:00'),
(24, '003', 'condition', 'bad', '2021-11-03 01:00:00'),
(25, '003', 'feeling', 'fine', '2021-11-01 01:00:00'),
(26, '003', 'weather', 'sunny', '2021-11-03 01:00:00'),
(27, '003', 'feeling', 'sad', '2021-11-03 01:00:00')
;
To see clearly, I order the above table by id and timestamp. It doesn't matter.
From the above table, I want to query the data to create a table with the latest date updated data as below. I will use 10 items at most for a query so that there may be 10 columns.
customer_id price condition feeling weather .......(there may be other columns from item column)
002 1900 null fine rain
001 1400 ok fine null
003 2000 bad sad sunny
This is the query that I get from previous questions, but I asked only for two item.
SELECT customer_id, p.value AS price, c.value AS condition
FROM (
SELECT DISTINCT ON (customer_id)
customer_id, value
FROM tbl
WHERE item = 'condition'
ORDER BY customer_id, timestamp DESC
) c
FULL JOIN (
SELECT DISTINCT ON (customer_id)
customer_id, value
FROM tbl
WHERE item = 'price'
ORDER BY customer_id, timestamp DESC
) p USING (customer_id)
So, if there is any better solution please help me. Thank you.
You may try other approaches using row_number
to generate a value to filter your data on the most recent data. You may then aggregate on customer id with the max value for a case expression filtering your records based on the desired row number rn=1
(we will order by descending) and item name.
These approaches are less verbose and based on the results online seem to be more performant. Let me know how replicating this in your environment works in the comments.
You may use EXPLAIN ANALYZE
to compare this approach to the current one. The results in the online environment provided:
Current Approach
| Planning time: 0.129 ms
| Execution time: 0.056 ms
Suggested Approach 1
| Planning time: 0.061 ms
| Execution time: 0.070 ms
Suggested Approach 2
| Planning time: 0.047 ms
| Execution time: 0.056 ms
NB. You may use EXPLAIN ANALYZE
to compare these approaches in your environment which we cannot replicate online. The results may also vary on each run. Indexes and early filters on the item
column are also recommended to improve performance.
Schema (PostgreSQL v9.5)
Suggested Approach 1
SELECT
t1.customer_id,
MAX(CASE WHEN t1.item='condition' THEN t1.value END) as conditio,
MAX(CASE WHEN t1.item='price' THEN t1.value END) as price,
MAX(CASE WHEN t1.item='feeling' THEN t1.value END) as feeling,
MAX(CASE WHEN t1.item='weather' THEN t1.value END) as weather
FROM (
SELECT
* ,
ROW_NUMBER() OVER (
PARTITION BY customer_id,item
ORDER BY tbl.timestamp DESC
) as rn
FROM
tbl
-- ensure that you filter based on your desired items
-- indexes on item column are recommended to improve performance
) t1
WHERE rn=1
GROUP BY
1;
customer_id | conditio | price | feeling | weather |
---|---|---|---|---|
001 | ok | 1400 | fine | |
002 | 1900 | fine | rain | |
003 | bad | 2000 | sad | sunny |
Suggested Approach 2
SELECT
t1.customer_id,
MAX(t1.value) FILTER (WHERE t1.item='condition') as conditio,
MAX(t1.value) FILTER (WHERE t1.item='price') as price,
MAX(t1.value) FILTER (WHERE t1.item='feeling') as feeling,
MAX(t1.value) FILTER (WHERE t1.item='weather') as weather
FROM (
SELECT
* ,
ROW_NUMBER() OVER (
PARTITION BY customer_id,item
ORDER BY tbl.timestamp DESC
) as rn
FROM
tbl
-- ensure that you filter based on your desired items
-- indexes on item column are recommended to improve performance
) t1
WHERE rn=1
GROUP BY
1;
customer_id | conditio | price | feeling | weather |
---|---|---|---|---|
001 | ok | 1400 | fine | |
002 | 1900 | fine | rain | |
003 | bad | 2000 | sad | sunny |
Current Approach with EXPLAIN ANALYZE
EXPLAIN(ANALYZE,BUFFERS)
SELECT customer_id, p.value AS price, c.value AS condition
FROM (
SELECT DISTINCT ON (customer_id)
customer_id, value
FROM tbl
WHERE item = 'condition'
ORDER BY customer_id, timestamp DESC
) c
FULL JOIN (
SELECT DISTINCT ON (customer_id)
customer_id, value
FROM tbl
WHERE item = 'price'
ORDER BY customer_id, timestamp DESC
) p USING (customer_id);
QUERY PLAN |
---|
Merge Full Join (cost=35.05..35.12 rows=1 width=128) (actual time=0.025..0.030 rows=3 loops=1) |
Merge Cond: (tbl.customer_id = tbl_1.customer_id) |
Buffers: shared hit=2 |
-> Unique (cost=17.52..17.54 rows=1 width=72) (actual time=0.013..0.014 rows=2 loops=1) |
Buffers: shared hit=1 |
-> Sort (cost=17.52..17.53 rows=3 width=72) (actual time=0.013..0.013 rows=5 loops=1) |
Sort Key: tbl.customer_id, tbl."timestamp" DESC |
Sort Method: quicksort Memory: 25kB |
Buffers: shared hit=1 |
-> Seq Scan on tbl (cost=0.00..17.50 rows=3 width=72) (actual time=0.004..0.006 rows=5 loops=1) |
Filter: (item = 'condition'::text) |
Rows Removed by Filter: 22 |
Buffers: shared hit=1 |
-> Materialize (cost=17.52..17.55 rows=1 width=64) (actual time=0.010..0.013 rows=3 loops=1) |
Buffers: shared hit=1 |
-> Unique (cost=17.52..17.54 rows=1 width=72) (actual time=0.010..0.012 rows=3 loops=1) |
Buffers: shared hit=1 |
-> Sort (cost=17.52..17.53 rows=3 width=72) (actual time=0.010..0.010 rows=10 loops=1) |
Sort Key: tbl_1.customer_id, tbl_1."timestamp" DESC |
Sort Method: quicksort Memory: 25kB |
Buffers: shared hit=1 |
-> Seq Scan on tbl tbl_1 (cost=0.00..17.50 rows=3 width=72) (actual time=0.001..0.003 rows=10 loops=1) |
Filter: (item = 'price'::text) |
Rows Removed by Filter: 17 |
Buffers: shared hit=1 |
Planning time: 0.129 ms |
Execution time: 0.056 ms |
Suggested Approach 1 with EXPLAIN ANALYZE
EXPLAIN(ANALYZE,BUFFERS)
SELECT
t1.customer_id,
MAX(CASE WHEN t1.item='price' THEN t1.value END) as price,
MAX(CASE WHEN t1.item='condition' THEN t1.value END) as conditio
FROM (
SELECT
* ,
ROW_NUMBER() OVER (
PARTITION BY customer_id,item
ORDER BY tbl.timestamp DESC
) as rn
FROM
tbl
where item IN ('price','condition')
) t1
WHERE rn=1
GROUP BY
1;
QUERY PLAN |
---|
GroupAggregate (cost=17.58..17.81 rows=1 width=96) (actual time=0.039..0.047 rows=3 loops=1) |
Group Key: t1.customer_id |
Buffers: shared hit=1 |
-> Subquery Scan on t1 (cost=17.58..17.79 rows=1 width=96) (actual time=0.030..0.040 rows=5 loops=1) |
Filter: (t1.rn = 1) |
Rows Removed by Filter: 10 |
Buffers: shared hit=1 |
-> WindowAgg (cost=17.58..17.71 rows=6 width=104) (actual time=0.029..0.038 rows=15 loops=1) |
Buffers: shared hit=1 |
-> Sort (cost=17.58..17.59 rows=6 width=104) (actual time=0.028..0.030 rows=15 loops=1) |
Sort Key: tbl.customer_id, tbl.item, tbl."timestamp" DESC |
Sort Method: quicksort Memory: 26kB |
Buffers: shared hit=1 |
-> Seq Scan on tbl (cost=0.00..17.50 rows=6 width=104) (actual time=0.003..0.008 rows=15 loops=1) |
Filter: (item = ANY ('{price,condition}'::text[])) |
Rows Removed by Filter: 12 |
Buffers: shared hit=1 |
Planning time: 0.061 ms |
Execution time: 0.070 ms |
Suggested Approach 2 with EXPLAIN ANALYZE
EXPLAIN(ANALYZE,BUFFERS)
SELECT
t1.customer_id,
MAX(t1.value) FILTER (WHERE t1.item='price') as price,
MAX(t1.value) FILTER (WHERE t1.item='condition') as conditio
FROM (
SELECT
* ,
ROW_NUMBER() OVER (
PARTITION BY customer_id,item
ORDER BY tbl.timestamp DESC
) as rn
FROM
tbl
where item IN ('price','condition')
) t1
WHERE rn=1
GROUP BY
1;
QUERY PLAN |
---|
GroupAggregate (cost=17.58..17.81 rows=1 width=96) (actual time=0.029..0.037 rows=3 loops=1) |
Group Key: t1.customer_id |
Buffers: shared hit=1 |
-> Subquery Scan on t1 (cost=17.58..17.79 rows=1 width=96) (actual time=0.021..0.032 rows=5 loops=1) |
Filter: (t1.rn = 1) |
Rows Removed by Filter: 10 |
Buffers: shared hit=1 |
-> WindowAgg (cost=17.58..17.71 rows=6 width=104) (actual time=0.021..0.030 rows=15 loops=1) |
Buffers: shared hit=1 |
-> Sort (cost=17.58..17.59 rows=6 width=104) (actual time=0.019..0.021 rows=15 loops=1) |
Sort Key: tbl.customer_id, tbl.item, tbl."timestamp" DESC |
Sort Method: quicksort Memory: 26kB |
Buffers: shared hit=1 |
-> Seq Scan on tbl (cost=0.00..17.50 rows=6 width=104) (actual time=0.003..0.008 rows=15 loops=1) |
Filter: (item = ANY ('{price,condition}'::text[])) |
Rows Removed by Filter: 12 |
Buffers: shared hit=1 |
Planning time: 0.047 ms |
Execution time: 0.056 ms |