I have this table for storing tick by tick argentinian bonds data:
bonos=# \d preciosrt
Table "public.preciosrt"
Column | Type | Collation | Nullable | Default
------------+------------------------+-----------+----------+-----------------------------------------------
idpreciort | integer | | not null | nextval('preciosrt_idpreciort_seq'::regclass)
fecha | date | | not null | now()::date
ticker | text | | not null |
bid | numeric | | |
ask | numeric | | |
hora | time without time zone | | not null |
bidqty | integer | | |
askqty | integer | | |
last | numeric | | |
volume | integer | | |
settlement | character varying(4) | | |
With this query I can get all the last prices of the "AL29" bond between 13:55 and 14:30:
select
row_number() over(), hora, last
from preciosrt
where
ticker='AL29' and
settlement='48hs' and
fecha=now()::date - interval '1 day' and
hora >= '13:55:00' and hora <= '14:30:00'
order by hora;
row_number | hora | last
------------+----------+--------
1 | 13:55:18 | 6260.0
2 | 13:56:00 | 6264.5
3 | 13:56:29 | 6264.5
4 | 13:57:08 | 6265.0
5 | 14:03:07 | 6266.0
6 | 14:05:16 | 6266.0
7 | 14:06:19 | 6266.0
8 | 14:06:33 | 6270.0
9 | 14:07:08 | 6276.5
10 | 14:07:24 | 6299.5
11 | 14:09:56 | 6318.5
12 | 14:11:24 | 6300.0
13 | 14:20:58 | 6315.0
14 | 14:24:58 | 6315.0
15 | 14:27:54 | 6315.0
16 | 14:28:06 | 6319.0
(16 rows)
Let's pay attention to rows 12 and 13, where a tick was at 14:11:24
and the next one at 14:20:58
.
Now, with this query I can show the ticks in 5 minutes intervals (span from 14:00:00
to 15:00:00
):
select
ticker,
last,
intervalo
from(
select
ticker, last,
row_number() over (
partition by to_timestamp(floor((extract('epoch' from hora) / 300 )) * 300) at time zone 'UTC'
order by hora desc
) as rownum,
to_char(to_timestamp(floor((extract('epoch' from hora) / 300 )) * 300) at time zone 'UTC', 'HH24:MI:SS') as intervalo
from preciosrt p1
where
p1.ticker='AL29' and p1.settlement='48hs' and
p1.fecha=now()::date - interval '1 day'
) x
where x.rownum=1 and x.intervalo >= '14:00:00' and x.intervalo <= '15:00:00';
ticker | last | intervalo
--------+--------+-----------
AL29 | 6266.0 | 14:00:00
AL29 | 6318.5 | 14:05:00
AL29 | 6300.0 | 14:10:00
AL29 | 6315.0 | 14:20:00
AL29 | 6319.0 | 14:25:00
AL29 | 6323.0 | 14:30:00
AL29 | 6320.0 | 14:35:00
AL29 | 6318.0 | 14:40:00
AL29 | 6329.5 | 14:50:00
AL29 | 6315.0 | 14:55:00
AL29 | 6316.0 | 15:00:00
(11 rows)
As you can see, at 14:10 the last price was 6300.0
, and the next one was at 14:20 6315
.
AL29 | 6300.0 | 14:10:00
AL29 | 6315.0 | 14:20:00
Now I want to add the missing data at 14:15:00
containig the value of the last known tick, that is 6300
. How can I do this using a PostgreSql query?.
Edit
Using generate_series
I can create the whole interval, then left join to my data, but as you can guess the gaps are still there:
with minutes as (
select generate_series(
timestamp '2000-01-01 11:00:00',
timestamp '2000-01-01 17:00:00',
'5 minute'::interval
) as tt
)
select
to_char(minutes.tt, 'HH24:MI:SS'),
x2.*
from minutes
left join
(
select
ticker,
last,
intervalo
from(
select
ticker, last,
row_number() over (
partition by to_timestamp(floor((extract('epoch' from hora) / 300 )) * 300) at time zone 'UTC'
order by hora desc
) as rownum,
to_char(to_timestamp(floor((extract('epoch' from hora) / 300 )) * 300) at time zone 'UTC', 'HH24:MI:SS') as intervalo
from preciosrt p1
where
p1.ticker='AL29' and p1.settlement='48hs' and
p1.fecha=now()::date - interval '1 day'
) x
where x.rownum=1
) x2 on x2.intervalo = to_char(minutes.tt, 'HH24:MI:SS');
14:05:00 | AL29 | 6318.5 | 14:05:00
14:10:00 | AL29 | 6300.0 | 14:10:00
14:15:00 | | |
14:20:00 | AL29 | 6315.0 | 14:20:00
14:25:00 | AL29 | 6319.0 | 14:25:00
14:30:00 | AL29 | 6323.0 | 14:30:00
14:35:00 | AL29 | 6320.0 | 14:35:00
14:40:00 | AL29 | 6318.0 | 14:40:00
14:45:00 | | |
14:50:00 | AL29 | 6329.5 | 14:50:00
14:55:00 | AL29 | 6315.0 | 14:55:00
15:00:00 | AL29 | 6316.0 | 15:00:00
15:05:00 | AL29 | 6320.5 | 15:05:00
15:10:00 | AL29 | 6374.5 | 15:10:00
15:15:00 | AL29 | 6344.0 | 15:15:00
15:20:00 | AL29 | 6395.0 | 15:20:00
15:25:00 | AL29 | 6400.0 | 15:25:00
15:30:00 | AL29 | 6355.0 | 15:30:00
15:35:00 | AL29 | 6400.0 | 15:35:00
15:40:00 | AL29 | 6395.0 | 15:40:00
Edit 2
I think I found it:
with minutes as (
select generate_series(
timestamp '2000-01-01 11:00:00',
timestamp '2000-01-01 17:00:00',
'5 minute'::interval
) as tt
)
select
to_char(minutes.tt, 'HH24:MI:SS'),
coalesce(x2.last, lag(last) over()) as last
from minutes
left join
(
select
ticker,
last,
intervalo
from(
select
ticker, last,
row_number() over (
partition by to_timestamp(floor((extract('epoch' from hora) / 300 )) * 300) at time zone 'UTC'
order by hora desc
) as rownum,
to_char(to_timestamp(floor((extract('epoch' from hora) / 300 )) * 300) at time zone 'UTC', 'HH24:MI:SS') as intervalo
from preciosrt p1
where
p1.ticker='AL29' and p1.settlement='48hs' and
p1.fecha=now()::date - interval '1 day'
) x
where x.rownum=1
) x2 on x2.intervalo = to_char(minutes.tt, 'HH24:MI:SS');
Gives this result:
to_char | last
----------+--------
11:00:00 |
11:05:00 |
11:10:00 |
11:15:00 |
11:20:00 |
11:25:00 |
11:30:00 |
11:35:00 | 6225.0
11:40:00 | 6225.0
11:45:00 | 6205.0
11:50:00 | 6233.5
11:55:00 | 6233.0
12:00:00 | 6225.0
12:05:00 | 6222.0
12:10:00 | 6222.0
12:15:00 | 6228.5
12:20:00 | 6220.0
12:25:00 | 6249.0
12:30:00 | 6258.5
12:35:00 | 6257.5
12:40:00 | 6265.0
12:45:00 | 6280.0
12:50:00 | 6296.0
12:55:00 | 6277.0
13:00:00 | 6270.0
13:05:00 | 6270.0
13:10:00 | 6269.0
13:15:00 | 6250.0
13:20:00 | 6260.0
13:25:00 | 6250.0
13:30:00 | 6250.0
13:35:00 | 6250.0
13:40:00 | 6250.0
13:45:00 | 6250.0
13:50:00 | 6260.0
13:55:00 | 6265.0
14:00:00 | 6266.0
14:05:00 | 6318.5
14:10:00 | 6300.0
14:15:00 | 6300.0
14:20:00 | 6315.0
14:25:00 | 6319.0
14:30:00 | 6323.0
14:35:00 | 6320.0
14:40:00 | 6318.0
14:45:00 | 6318.0
14:50:00 | 6329.5
14:55:00 | 6315.0
15:00:00 | 6316.0
15:05:00 | 6320.5
15:10:00 | 6374.5
15:15:00 | 6344.0
15:20:00 | 6395.0
15:25:00 | 6400.0
15:30:00 | 6355.0
15:35:00 | 6400.0
15:40:00 | 6395.0
15:45:00 | 6372.0
15:50:00 | 6405.0
15:55:00 | 6420.0
16:00:00 | 6420.0
16:05:00 | 6410.5
16:10:00 | 6400.0
16:15:00 | 6417.0
16:20:00 | 6395.0
16:25:00 | 6390.0
16:30:00 | 6388.5
16:35:00 | 6390.0
16:40:00 | 6390.0
16:45:00 | 6381.0
16:50:00 | 6380.5
16:55:00 | 6380.5
17:00:00 | 6379.0
Edit 3
Adapted from Tomáš Záluský answer:
with minutes as (
select generate_series(
timestamp '2000-01-01 11:00:00',
timestamp '2000-01-01 17:00:00',
'5 minute'::interval
) as tt
), t as (
select
to_char(minutes.tt, 'HH24:MI:SS'),
x2.*,
minutes.tt
from minutes
left join
(
select
ticker,
last,
intervalo
from(
select
ticker, last,
row_number() over (
partition by to_timestamp(floor((extract('epoch' from hora) / 300 )) * 300) at time zone 'UTC'
order by hora desc
) as rownum,
to_char(to_timestamp(floor((extract('epoch' from hora) / 300 )) * 300) at time zone 'UTC', 'HH24:MI:SS') as intervalo
from preciosrt p1
where
p1.ticker='AL29' and p1.settlement='48hs' and
p1.fecha=now()::date - interval '1 day'
) x
where x.rownum=1
) x2 on x2.intervalo = to_char(minutes.tt, 'HH24:MI:SS')
), x as (
select t.*, count(ticker) over(order by tt) as parttn from t
)
select
tt::time,
first_value(ticker) over(partition by parttn order by tt),
first_value(last) over(partition by parttn order by tt)
from x;
You are solving problem of filling missing values from previous rows.
Left join is good start. You need to substitute any null
value of ticker
and last
column for last nonnull value in ordering by minutes.tt
. This could be normally achieved using lag
window function. The lag
function returns value from previous row. Although it would work in your concrete case, I assume there can generally be gap of many consecutive 5m intervals. In this case, lag
could be used only with ignore nulls
clause: case when ticker is null then lag(ticker) ignore nulls over (order by tt) else ticker end
. Unfortunately it is currently not supported by Postgres.
Workaround: use count
analytic function to get sequence of only nonnull values (i.e. those joined by left join) in ordering by tt
. The function returns number which increases with each nonnull value but stays same with each null
value. This number describes chunks consisted of first nonnull value and potentially many following null
values. Hence you can use this number to define partition. The data you want is the first_value
over that partition.
with minutes as (
...YOUR SERIES...
), t as (
select
to_char(minutes.tt, 'HH24:MI:SS') as tt,
...YOUR QUERY...
), x as (
select t.*, count(ticker) over (order by tt) as parttn from t
)
select tt
, first_value(ticker) over (partition by parttn order by tt)
, first_value(last) over (partition by parttn order by tt)
from x
Simplified fiddle here. If I overlooked something important in your data then roll your own fiddle please.