Search code examples
postgresqlarray-agg

Query table with array_agg/median of ALL previous positions, LAST_10, LAST_50, excluding current position


This is a variation on this brilliantly answered question I posted previously:

I have a database table with:

id | date       | position | name
--------------------------------------
1  | 2016-06-29 | 9        | Ben Smith
2  | 2016-06-29 | 1        | Ben Smith
3  | 2016-06-29 | 5        | Ben Smith
4  | 2016-06-29 | 6        | Ben Smith
5  | 2016-06-30 | 2        | Ben Smith
6  | 2016-06-30 | 2        | Tom Brown
7  | 2016-06-29 | 4        | Tom Brown
8  | 2016-06-30 | 2        | Tom Brown
9  | 2016-06-30 | 1        | Tom Brown

How can I query the table efficiently so that I can get new columns using array_agg().

I have already tried the following query however its incredibly slow and also wrong as it doesn't group the previous_positions by the name column:

 SELECT runners.id AS runner_id,
    btrim(regexp_replace(replace(upper(runners.name::text), '.'::text, ''::text), '[[:digit:]]'::text, ''::text, 'g'::text)) AS name,
    runners.position_two,
    (array_agg(runners.position_two) OVER w AS results
   FROM runners
  WINDOW w AS (PARTITION BY (btrim(regexp_replace(replace(upper(runners.name::text), '.'::text, ''::text), '[[:digit:]]'::text, ''::text, 'g'::text))) ORDER BY runners.id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING);

I expect the table output to look like this

id | date       | position | name      | previous   | med  |med_20
----------------------------------------------------------------------
1  | 2016-06-29 | 9        | Ben Smith | {}         |      |
2  | 2016-06-29 | 1        | Ben Smith | {9}        | 9    | 9
3  | 2016-06-29 | 5        | Ben Smith | {9,1}      | 5    | 5
4  | 2016-06-29 | 6        | Ben Smith | {9,1,5}    | 5    | 5
5  | 2016-06-30 | 2        | Ben Smith | {9,1,5,6}  | 5.5  | 5.5
6  | 2016-06-30 | 2        | Tom Brown | {}         | None | None
7  | 2016-06-29 | 4        | Tom Brown | {2}        | 2    | 2
8  | 2016-06-30 | 2        | Tom Brown | {2,4}      | 3    | 3
9  | 2016-06-30 | 1        | Tom Brown | {2,4,2}    | 2    | 2

Solution

  • Postgres doesn't have a built-in aggregate function for MEDIAN. But, you can create one using the function snippet available in Postgres wiki. This snippet is also part of the ulib_agg user-defined library.

    Once it is created you may use it like any aggregate function like SUM or STRING_AGG with similar window specification. Postgres provides you the option to specify multiple window definitions for aggregate functions separated by a comma.

    So, to get a MEDIAN of previous 20 records, your window could be defined as in this query.

    SELECT 
    j.* ,  array_agg(position) over w as previous_positions,
           median(position)    over w_20 as med_20
      FROM jockeys j
    WINDOW w as
    (  partition by name ORDER BY id rows between 
         unbounded preceding and 1 preceding
         ),
         w_20 as
         (  partition by name ORDER BY id rows between 
               20 preceding and 1 preceding
         )
    

    On top of that you may apply ROUND function if you want to truncate decimal digits.

    DEMO