I'm trying to create a view based off a table. I want to get a set of rows where there is an existing tax_id_no
, with each row having the most recent information. So I'm ordering by timestamps descending. However, each tax_id_no
can have multiple rows, and not every row will have all the information. So I want to get the first valid piece of information for each column. Right now I've got this:
SELECT * FROM
(
SELECT DISTINCT ON (store_id, tax_id_no)
event_id,
event_tstamp,
owner_id,
store_id,
tax_id_no,
first_value(year_built) OVER (ORDER BY year_built IS NULL, event_tstamp) AS year_built, --New
first_value(roof_replaced_year) OVER (ORDER BY roof_replaced_year IS NULL, event_tstamp) AS roof_replaced_year, --New
first_value(number_of_rooms) OVER (ORDER BY number_of_rooms IS NULL, event_tstamp) AS number_of_rooms, --New
FROM MySchema.Event
WHERE tax_id_no IS NOT NULL AND tax_id_no != ''
order by store_id, tax_id_no, event_tstamp DESC
) t1
WHERE owner_id IS NOT NULL OR owner_id != '';
This is getting the same first valid information for every row though. So instead of getting results like this, which is what I want:
event_id event_tstamp owner_id store_id tax_id_no year_built roof_replaced_year number_of_rooms
04 2016-05-12 123 02 12345 1996 2009 6
05 2017-02-02 245 02 23456 1970 1999 8
08 2017-03-03 578 03 34567 2002 2016 10
I'm getting this, which all the rows looking the same in the first_value
columns:
event_id event_tstamp owner_id store_id tax_id_no year_built roof_replaced_year number_of_rooms
04 2016-05-12 123 02 12345 1996 2009 6
05 2017-02-02 245 02 23456 1996 2009 6
08 2017-03-03 578 03 34567 1996 2009 6
Is it possible to select a different first_value
for each row? I was thinking I could do some kind of a join across multiple selects from the same table, but I'm not sure that would actually give me unique values for each row instead of just having the same problem again. There's also the length of time for such queries to consider, which so far have been prohibitively expensive.
You can use a partition in your window functions to group the rows before applying the function. That will generate a distinct result for each partition.
For example:
first_value(number_of_rooms) OVER (
PARTITION BY tax_id_no
ORDER BY number_of_rooms IS NULL, event_tstamp
) AS number_of_rooms,