Search code examples
sqloracle-databaseoracle12c

Oracle Invalid Number in Join Clause


I am getting an Oracle Invalid Number error that doesn't make sense to me. I understand what this error means but it should not be happening in this case. Sorry for the long question, but please bear with me so I can explain this thoroughly.

I have a table which stores IDs to different sources, and some of the IDs can contain letters. Therefore, the column is a VARCHAR.

One of the sources has numeric IDs, and I want to join to that source:

SELECT *
FROM (
    SELECT AGGPROJ_ID -- this column is a VARCHAR
    FROM AGG_MATCHES -- this is the table storing the matches
    WHERE AGGSRC = 'source_a'
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID);

In most cases this works, but depending on random things such as what columns are in the select clause, if it uses a left join or an inner join, etc., I will start seeing the Invalid Number error.

I have verified multiple times that all entries in AGG_MATCHES where AGGSRC = 'source_a' do not contain non numeric characters in the AGGPROJ_ID column:

-- this returns no results
SELECT AGGPROJ_ID
FROM AGG_MATCHES
WHERE AGGSRC = 'source_a' AND REGEXP_LIKE(AGGPROJ_ID, '[^0-9]');

I know that Oracle basically rewrites the query internally for optimization. Going back to the first SQL example, my best guess is that depending on how the entire query is written, in some cases Oracle is trying to perform the JOIN before the sub query. In other words, it's trying to join the entire AGG_MATCHES tables to SOURCE_A instead of just the subset returned by the sub query. If so, there would be rows that contain non numeric values in the AGGPROJ_ID column.

Does anyone know for certain if this is what's causing the error? If it is the reason, is there anyway for me to force Oracle to execute the sub query part first so it's only trying to join a subset of the AGG_MATCHES table?


A little more background:

This is obviously a simplified example to illustrate the problem. The AGG_MATCHES table is used to store "matches" between different sources (i.e. projects). In other words, it's used to say that a project in sourceA is matched to a project in sourceB.

Instead of writing the same SQL over and over, I've created views for the sources we commonly use. The idea is to have a view with two columns, one for SourceA and one for SourceB. For this reason, I don't want to use TO_CHAR on the ID column of the source table, because devs would have to remember to do this every time they are doing a join, and I'm trying to remove code duplication. Also, since the ID in SOURCE_A is a number, I feel that any view storing SOURCE_A.ID should go ahead and convert it to a number.


Solution

  • You are right that Oracle is executing the statement in a different order than what you wrote, causing conversion errors.

    The best ways to fix this problem, in order, are:

    1. Change the data model to always store data as the correct type. Always store numbers as numbers, dates as dates, and strings as strings. (You already know this and said you can't change your data model, this is a warning for future readers.)
    2. Convert numbers to strings with a TO_CHAR.
    3. If you're on 12.2, convert strings to numbers using the DEFAULT return_value ON CONVERSION ERROR syntax, like this:

      SELECT *
      FROM (
          SELECT AGGPROJ_ID -- this column is a VARCHAR
          FROM AGG_MATCHES -- this is the table storing the matches
          WHERE AGGSRC = 'source_a'
      ) m
      JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID default null on conversion error);
      
    4. Add a ROWNUM to an inline view to prevent optimizer transformations that may re-write statements. ROWNUM is always evaluated at the end and it forces Oracle to run things in a certain order, even if the ROWNUM isn't used. (Officially hints are the way to do this, but getting hints right is too difficult.)

      SELECT *
      FROM (
          SELECT AGGPROJ_ID -- this column is a VARCHAR
          FROM AGG_MATCHES -- this is the table storing the matches
          WHERE AGGSRC = 'source_a'
              --Prevent optimizer transformations for type safety.
              AND ROWNUM >= 1
      ) m
      JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID);