This is a variation on this brilliantly answered question I posted previously:
I have a database table with:
id | date | position | name
--------------------------------------
1 | 2016-06-29 | 9 | Ben Smith
2 | 2016-06-29 | 1 | Ben Smith
3 | 2016-06-29 | 5 | Ben Smith
4 | 2016-06-29 | 6 | Ben Smith
5 | 2016-06-30 | 2 | Ben Smith
6 | 2016-06-30 | 2 | Tom Brown
7 | 2016-06-29 | 4 | Tom Brown
8 | 2016-06-30 | 2 | Tom Brown
9 | 2016-06-30 | 1 | Tom Brown
How can I query the table efficiently so that I can get new columns using array_agg().
I have already tried the following query however its incredibly slow and also wrong as it doesn't group the previous_positions by the name column:
SELECT runners.id AS runner_id,
btrim(regexp_replace(replace(upper(runners.name::text), '.'::text, ''::text), '[[:digit:]]'::text, ''::text, 'g'::text)) AS name,
runners.position_two,
(array_agg(runners.position_two) OVER w AS results
FROM runners
WINDOW w AS (PARTITION BY (btrim(regexp_replace(replace(upper(runners.name::text), '.'::text, ''::text), '[[:digit:]]'::text, ''::text, 'g'::text))) ORDER BY runners.id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING);
I expect the table output to look like this
id | date | position | name | previous | med |med_20
----------------------------------------------------------------------
1 | 2016-06-29 | 9 | Ben Smith | {} | |
2 | 2016-06-29 | 1 | Ben Smith | {9} | 9 | 9
3 | 2016-06-29 | 5 | Ben Smith | {9,1} | 5 | 5
4 | 2016-06-29 | 6 | Ben Smith | {9,1,5} | 5 | 5
5 | 2016-06-30 | 2 | Ben Smith | {9,1,5,6} | 5.5 | 5.5
6 | 2016-06-30 | 2 | Tom Brown | {} | None | None
7 | 2016-06-29 | 4 | Tom Brown | {2} | 2 | 2
8 | 2016-06-30 | 2 | Tom Brown | {2,4} | 3 | 3
9 | 2016-06-30 | 1 | Tom Brown | {2,4,2} | 2 | 2
Postgres doesn't have a built-in aggregate function for MEDIAN
. But, you can create one using the function snippet available in Postgres wiki. This snippet is also part of the ulib_agg user-defined library.
Once it is created you may use it like any aggregate function like SUM
or STRING_AGG
with similar window
specification. Postgres provides you the option to specify multiple window
definitions for aggregate functions separated by a comma.
So, to get a MEDIAN
of previous 20 records, your window could be defined as in this query.
SELECT
j.* , array_agg(position) over w as previous_positions,
median(position) over w_20 as med_20
FROM jockeys j
WINDOW w as
( partition by name ORDER BY id rows between
unbounded preceding and 1 preceding
),
w_20 as
( partition by name ORDER BY id rows between
20 preceding and 1 preceding
)
On top of that you may apply ROUND
function if you want to truncate decimal digits.