Search code examples
postgresqlaggregate-functionsaggregates

Postgres, CREATE AGGREGATE, accessing index and row count inside aggregate function for [non]oxford comma text


The goal is a custom aggregate function made with CREATE AGGREGATE called string_agg_oxford; it is an aggregate function that works similar to string_agg except it is smart enough to know how many items it is aggregating so that it can place "and" in front of the last item.

So where string_agg(items, ', ') would return "item1, item2, item3", string_agg_oxford(items) will return "item1, item2, and item3".

My failed attempt starts with a type for our accumulator that includes the total number of rows and the index for the current row:

CREATE TYPE oxford_accumulator as (
  row_count numeric,
  i numeric,
  acc text
);

Now we need our accumulator function:

CREATE OR REPLACE FUNCTION oxford_acc (acc oxford_accumulator, curr text)
  RETURNS oxford_accumulator
  LANGUAGE PLPGSQL
  AS $$
BEGIN
  IF acc.i + 1 = acc.row_count THEN
    RETURN (acc.row_count, acc.i + 1, acc.acc || curr);
  END IF;

  IF (acc.i + 2 = acc.row_count) AND (acc.row_count = 2)  THEN
    RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ' and ');
  END IF;

  IF (i + 2 = acc.row_count) THEN
    RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ', and ');
  END IF;

  RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ', ');
END;
$$;

because the accumulator has swallowed up the total count and the index we have to release this information when the accumulator is finished with an ffunc.

CREATE OR REPLACE FUNCTION oxford_final (acc oxford_accumulator)
  RETURNS text
  LANGUAGE PLPGSQL
  AS $$
BEGIN
  RETURN acc.acc;
END;
$$;

My idea falls apart here where we need to wire it all up because there does not seem to be a way to parametrize the total row count... so fail.

CREATE OR REPLACE AGGREGATE string_agg_oxford (text, row_count numeric) (
  INITCOND = (row_count, 0, ''),
    --         ^^^ fail
  STYPE = oxford_accumulator,
  SFUNC = oxford_acc,
  FINALFUNC = oxford_final
);

I know something similar can be achieved with a regular function, but I'm not ready to give up yet if there's a way to do this as an aggregator that could be used in a select statement like SELECT string_agg_oxford(clients.full_name) FROM matters GROUP BY matters.matter_id;


Solution

    1. Initial condition must be "a string constant in the form accepted for the data type state_data_type" so no way to pass a dynamic value. Credit to @a_horse_with_no_name

    2. For efficiency reasons, it would be easier to "peel the last value out of the accumulator." Credit to @jjanes.

    3. Lets use a unique separator to alleviate any issues matching on ", " because that is fairly common - so we'll use a unique separator and then replace those in the ffunc as is appropriate.

    4. Postgres does not support having capture groups within its lookahead assertions, so instead of a fancy regular expression to find the last occurrence of our unique separator, we will reverse the string and tackle things upside down.

    CREATE OR REPLACE FUNCTION oxford_acc (acc text, curr text)
      RETURNS text
      LANGUAGE PLPGSQL
      AS $$
    BEGIN
      RETURN acc || curr || '@$@$';
      --                     ^^^ unique separator, can be anything unique
    END;
    $$;
    
    CREATE OR REPLACE FUNCTION oxford_final (acc text)
      RETURNS text
      LANGUAGE PLPGSQL
      AS $$
    DECLARE
        my_result text;
        my_counter numeric;
    BEGIN
        SELECT left(acc, -4) INTO my_result;
        -- ^^^ removes the last separator
        SELECT count(*) FROM regexp_matches(my_result, '@\$@\$', 'g') INTO my_counter;
        
        IF my_counter = 0 THEN
          RETURN my_result;
        END IF;
        
        IF my_counter = 1 THEN
          RETURN regexp_replace(my_result, '@\$@\$', ' and ');
        END IF;
        
        SELECT reverse(my_result) INTO my_result;
        
        SELECT regexp_replace(my_result, '\$@\$@', ' dna ,') INTO my_result;
        
        SELECT reverse(my_result) INTO my_result;
        
        RETURN regexp_replace(
          my_result, 
          '@\$@\$',
          ', ',
          'g'
        );
    END;
    $$;
    
    CREATE OR REPLACE AGGREGATE oxford_agg (text) (
      INITCOND = '',
      STYPE = text,
      SFUNC = oxford_acc,
      FINALFUNC = oxford_final
    );
    

    Suggestions for improvements welcome!