Search code examples
sqloracleplsqlhash

Oracle ORA_HASH function. Will the results be constant?


We have the requirement to give a unique (per table) NUMERIC Id to multiple tables in our ERP System, because an interface to a 3rd party system requires it. The keys in these oracle tables, including the Objid are usually Varchar2. I do not want to create a new column in 20+ tables if it can be prevented so i was wondering if ORA_HASH would do the trick.

My question is: If i run e.g.

Select  ora_hash(order_no || release_no || line_no)  from purchase_order_line

with changing data in the table, will i always get the same hash value for a certain order_no || release_no || line_no combination? Or will that hash value be different at different times and if the overall data in the table changes. If the hash value remains constant i can limit it to the max size of the bigint required by the 3rd party system and always get the same unique value for a certain combination, right? Any issues with my line of thinking? What would happen if we ever switch to a new Database, migrating all our data. I strongly assume we would get different datasets per combiantion then before, right?

The amount of lines in the table will not exceed the max value of a BIGINT.

I tried the query multiple times while adding and deleting data from the table and it always returned the same hash.

Select  ora_hash(order_no || release_no || line_no)  from purchase_order_line

I also did a quick check on a few tables with more data and we seem to have no collisions on the primary keys.


Solution

  • You are required to add a unique numeric ID to your tables, and you want to avoid this, because your tables already have a unique ID albeit alphanumeric. You are looking for a way to get to a deterministic numeric ID.

    Depending on how many different characters are allowed in these IDs and how long these IDs can become, the following PL/SQL function may be sufficient to get from your alphanumeric ID to a numeric ID. (Too many allowed characters or too long alphanumeric IDs would cause an overflow exception.)

    CREATE OR REPLACE FUNCTION get_unique_id(p_objid VARCHAR2) RETURN INTEGER DETERMINISTIC
    IS
      PRAGMA UDF;
      v_allowed  VARCHAR2(4000) := 'ABCDEDGIJKLMNOPQRSTUVWXYZ1234567890';
      v_factor   INTEGER     := 1;
      v_result   INTEGER     := 0;
    BEGIN
      FOR i IN 1 .. LENGTHC(p_objid) LOOP
        v_result := v_result + INSTRC(v_allowed, SUBSTRC(p_objid, i, 1)) * v_factor;
        v_factor := v_factor * (LENGTHC(v_allowed) + 1);
      END LOOP;
      RETURN v_result;
    END get_unique_id;
    

    It is rather simple after all. You have a string of allowed characters that you use to get from your objid characters to a number.

    Let's say there are 99 allowed characters. The first character in your objid will get a value from 1 to 99 depending on its position in the allowed characters string. The next character will get a value from 1 to 99 multiplied with 100. And so on. The sum of all these values is the resulting numeric ID. (You can of course run through your objid string from right to left instead, if you find this more intuitive.)

    How to test the function:

    select get_unique_id('0000000000') from dual;
    

    Change '0000000000' to a string of the maximal allowed number of characters in your objid and the last allowed character in v_allowed. Do you get a result or an exception? Is the resulting number too big are small enough to match the requirement?


    At last, a look on the docs for ORA_HASH (docs.oracle.com/../ORA_HASH.html):

    ORA_HASH(expr [,max_bucket])
    

    There are no restrictions on the length of data represented by expr

    The optional max_bucket argument determines the maximum bucket value returned by the hash function. You can specify any value between 0 and 4294967295. The default is 4294967295.

    This means there is an unlimited number of input values and a limited number of resulting hash values. In other words, different input can result in the same output. So, you cannot use this function for your purpose. The same is true for about every generic hash function you can find. Only if a hash function is written to generate a unique result for a known, limited set of input data, would it work for you. And well, in that context you can call my function above a non-generic hash function.

    And a remark on deterministic functions: Oracle knows deterministic functions that are explicitly declared deterministic, just as I declared mine. These functions are guaranteed to return deterministic results, i.e. always the same output value for the same input value. So, if ORA_HASH is deterministic, this means that when you use it in your Oracle version, it will reliably result in the same output whenever you call it with the same input. It doesn't matter whether your Oracle version is 12c or 19c or 21c or whatever. It is not guaranteed, however, that Oracle never changes the algorithm. So while the function may be deterministic in Oracle 12c and still be deterministic in 21c, it is not guaranteed that the two versions will produce the same results. So, unless Oracle explicitly says so, the function is not guaranteed to be deterministic across database versions.