Search code examples
pythonoracletimeto-date

Problem: Incorrect displaying of time difference (in months) between Python and Oracle


The background of my problem is:

  • there's SQL query in Python that extracts particular data from Oracle database
  • one column/information that is extracted from Oracle is information about how long a customer has open account by the company (in years)

The particular part of the query:

ROUND(MONTHS_BETWEEN(sysdate, TO_DATE(c.registration_date, 'dd.mm.yy')) / 12, 1) AS Years_Client

The problem:

  • when I run the query in Oracle (SQL Developer), I receive correct values
  • BUT when I run the query within Python code, it returns a column that contains both negative values or really high values (e.g. 21), which in the first place is a nonsense and in the second as well (as the company even doesn't exist that long)

<class 'pandas.core.series.Series'>

  • checked Oracle documentation
  • tired to run the whole query both in Oracle SQL Developer and Python
  • it works perfectly fine in Oracle, but not in Python

Solution

  • NEVER use TO_DATE on a column that is already a DATE.

    TO_DATE takes a string as the first argument and with you pass it a DATE then Oracle needs to convert it to a string so that TO_DATE can convert it back to a date - which either:

    1. is pointless (as it was already a date); or

    2. will cause errors (as is your case) because Oracle will convert your query to:

      ROUND(
        MONTHS_BETWEEN(
          sysdate,
          TO_DATE(
            TO_CHAR(
              c.registration_date,
              (SELECT value FROM NLS_SESSION_PARAMETERS WHERE parameter = 'NLS_DATE_FORMAT')
            ),
            'dd.mm.yy'
          )
        ) / 12,
        1
      )
      

      and if the NLS_DATE_FORMAT does not match dd.mm.yy then it will either raise an exception or you will get an erroneous value.

    In your case:

    • in SQL Developer, the NLS_DATE_FORMAT matches dd.mm.yy and the query works but the TO_DATE call is pointless.
    • in Python, the NLS_DATE_FORMAT is different and does not match dd.mm.yy so you are getting erroneous values.

    To solve it, remove the TO_DATE:

    ROUND(MONTHS_BETWEEN(sysdate, c.registration_date) / 12, 1) AS Years_Client
    

    For example, if you have the data:

    CREATE TABLE table_name (registration_date) AS
    SELECT SYSDATE FROM DUAL UNION ALL
    SELECT DATE '2024-04-01' FROM DUAL UNION ALL
    SELECT DATE '1901-04-24' FROM DUAL;
    

    and the NLS_DATE_FORMAT is YY-MM-DD then:

    SELECT ROUND(MONTHS_BETWEEN(sysdate, TO_DATE(c.registration_date, 'dd.mm.yy')) / 12, 1)
            AS Years_Client,
           ROUND(MONTHS_BETWEEN(sysdate, c.registration_date) / 12, 1)
            AS actual_Years_Client
    FROM   table_name c
    

    Outputs

    YEARS_CLIENT ACTUAL_YEARS_CLIENT
    4 0
    23 .1
    .1 123

    fiddle