Search code examples
oracleplsqluser-defined-functions

How to create Aggregate UDF in Oracle PL/SQL


Say I have a table:

CREATE TABLE staff (
  id INT,
  name CHAR(9)
);

With data:

INSERT INTO staff (id, name) VALUES (1, 'Joe');
INSERT INTO staff (id, name) VALUES (2, 'Bob');
INSERT INTO staff (id, name) VALUES (3, 'Alice');

I need to create a multi row UDF, something like the built-in AVG function, such that I can call it in the following manner:

SELECT vowel_count(name) FROM staff;

And assuming vowels are [AaEeIiOoUu], get the following result:

| vowel_count(name) |
|-------------------|
|                 6 |

What is the syntax to take a table column as input to a UDF?

CREATE OR REPLACE FUNCTION vowel_cnt(/* what goes here? */) 
  RETURN NUMBER
IS
  ...
BEGIN
  ...
END;

The function must be table agnostic, just like SUM, AVG, etc.

I am using Oracle PL/SQL and SQL Developer.


Solution

  • As @WilliamRoberston said, you can define your own aggregate function with OCI Data Cartridge.

    Partly because I haven't done this for a while and wanted to remind myself, here's a working implementation to count vowels.

    First, create an object type with the required functions, and a numeric variable to hold the total count:

    create or replace type t_vowel_count as object (
      g_count number,
      static function ODCIAggregateInitialize(
        p_ctx in out t_vowel_count
      ) return number,
      member function ODCIAggregateIterate(
        self in out t_vowel_count, p_string varchar2
      ) return number,
      member function ODCIAggregateTerminate(
        self in out t_vowel_count, p_result out number, p_flags in number
      ) return number,
      member function ODCIAggregateMerge(
        self in out t_vowel_count, p_ctx in t_vowel_count
      ) return number
    );
    /
    

    Then create the type body, with - in this case - fairly straightforward function bodies:

    create or replace type body t_vowel_count as
      static function ODCIAggregateInitialize(
        p_ctx in out t_vowel_count
      ) return number is
      begin
        p_ctx := t_vowel_count(null);
        -- initialise count to zero
        p_ctx.g_count := 0;
        return ODCIConst.success;
      end ODCIAggregateInitialize;
    
      member function ODCIAggregateIterate(
        self in out t_vowel_count, p_string varchar2
      ) return number is
      begin
        -- regex is clearer...
        -- self.g_count := self.g_count + regexp_count(p_string, '[aeiou]', 1, 'i');
        -- but translate is faster...
        self.g_count := self.g_count
          + coalesce(length(p_string), 0)
          - coalesce(length(translate(p_string, 'xaAeEiIoOuU', 'x')), 0);
        return ODCIConst.success;
      end ODCIAggregateIterate;
    
      member function ODCIAggregateTerminate(
        self in out t_vowel_count, p_result out number, p_flags in number
      ) return number is
      begin
        p_result := self.g_count;
        return ODCIConst.success;
      end ODCIAggregateTerminate;
    
      member function ODCIAggregateMerge(
        self in out t_vowel_count, p_ctx in t_vowel_count
      ) return number is
      begin
        self.g_count := self.g_count + p_ctx.g_count;
        return ODCIConst.success;
      end ODCIAggregateMerge;
    end t_vowel_count;
    /
    

    The count of vowels in each individual string could be done in various ways; regular expressions are clear but slow, so I've shown a translate() version which should be fast. I've included @MTO's suggestion to wrap that in coalesce for a null result (for the edge case where the string consists only of vowels, and also to handle null inputs (though it gets the right result without those changes; it's safer to assume it might not one day...).

    And finally create the function that uses that type:

    create or replace function vowel_count (p_string varchar2)
    return number
    parallel_enable
    aggregate using t_vowel_count;
    /
    

    With your sample data you can now do:

    SELECT vowel_count(name) FROM staff;
    
    VOWEL_COUNT(NAME)
    6

    fiddle including the edge cases.