Search code examples
sqlsnowflake-cloud-data-platforminner-joinself-join

Understanding INNER JOIN logic


I have the following table Exchange Rates Schema:

name type kind null? default primary key unique key
COUNTRY VARCHAR(10) COLUMN Y N N
RATETYPE VARCHAR(6) COLUMN Y N N
FROMCURRENCY VARCHAR(3) COLUMN Y N N
TOCURRENCY VARCHAR(3) COLUMN Y N N
STARTDATE VARCHAR(12) COLUMN Y N N
RATE NUMBER(15,7) COLUMN Y N N

Of which I only want the USD/MTHEND rows, i.e.:

SELECT FromCurrency, ToCurrency, Date(StartDate, 'YYYYMMDD') AS StartDate, Rate 
FROM EXCHANGERATES
WHERE DATE(StartDate, 'YYYYMMDD') > CURRENT_DATE - 15000 AND RATETYPE = 'MTHEND' AND ToCurrency = 'USD'
ORDER BY FromCurrency, ToCurrency, StartDate;
FROMCURRENCY TOCURRENCY STARTDATE RATE
JPY USD 2018-12-01 113.4700000
JPY USD 2019-03-30 0.0090342
JPY USD 2019-06-28 0.0092721
JPY USD 2019-08-02 0.0093388
JPY USD 2019-08-30 0.0093967
JPY USD 2019-09-27 0.0092729
JPY USD 2019-11-01 0.0092592
JPY USD 2019-11-29 0.0091315
JPY USD 2019-12-28 0.0091174
JPY USD 2020-02-01 0.0091675
JPY USD 2020-02-29 0.0091802
JPY USD 2020-03-28 0.0092157
JPY USD 2020-05-02 0.0093431
JPY USD 2020-05-30 0.0093266
JPY USD 2020-06-27 0.0093361
JPY USD 2020-08-01 0.0095812
JPY USD 2020-08-29 0.0094144
JPY USD 2020-09-26 0.0094966
JPY USD 2020-10-31 0.0095739
JPY USD 2020-11-27 0.0096061
JPY USD 2020-12-26 0.0096525
JPY USD 2021-01-30 0.0095693
JPY USD 2021-02-27 0.0094197
... ... ... ...
JPY USD 2022-02-26 0.0086700

But there is no End Date column, hence I have the following query using self INNER JOIN to set the end date:

    SELECT
    EX.FromCurrency,
    EX.ToCurrency,
    DATE(EX.StartDate,'YYYYMMDD') AS StartDate, DATE(EX2.EndDate,'YYYYMMDD') AS EndDate,
    EX.Rate    
FROM
    EXCHANGERATES EX
INNER JOIN(
    SELECT
        FromCurrency,
        ToCurrency,
        Max(StartDate) AS StartDate,
        20251231 AS EndDate
    FROM
        EXCHANGERATES
    WHERE
        RateType = 'MTHEND'
    GROUP BY
        Fromcurrency,
        ToCurrency
UNION
    SELECT
        E2.FromCurrency,
        E2.ToCurrency,
        Max(E.StartDate) AS StartDate,
        to_number(to_char(DateAdd(DAY,-1,To_Date(to_char(E2.StartDate),'YYYYMMDD')),'YYYYMMDD')) AS EndDate
    FROM
        EXCHANGERATES E
    INNER JOIN 
EXCHANGERATES E2 ON
        E.StartDate < E2.StartDate
        AND E.RateType = E2.RateType
    WHERE
        E.RateType = 'MTHEND'
    GROUP BY
        E2.FromCurrency,
        E2.ToCurrency,
        E2.StartDate) AS EX2 ON
    EX.FromCurrency = EX2.FromCurrency
    AND EX.ToCurrency = EX2.ToCurrency
    AND EX.StartDate = EX2.StartDate
    AND EX.RateType = 'MTHEND'
WHERE
    Ex.tocurrency = 'USD'
ORDER BY    1,  2,  3;
FROMCURRENCY TOCURRENCY STARTDATE ENDDATE RATE
JPY USD 2019-12-28 2020-01-31 0.0091174
JPY USD 2020-05-02 2020-05-29 0.0093431
JPY USD 2020-05-30 2020-06-26 0.0093266
JPY USD 2020-06-27 2020-07-31 0.0093361
JPY USD 2020-08-01 2020-08-28 0.0095812
JPY USD 2020-09-26 2020-10-30 0.0094966
JPY USD 2020-10-31 2020-11-26 0.0095739
JPY USD 2020-12-26 2021-01-29 0.0096525
JPY USD 2021-01-30 2021-02-26 0.0095693
JPY USD 2021-02-27 2021-03-26 0.0094197

Why is the INNER result different to tinazmu's query using LEAD below? The below captures all unique USD/MTHEND rows with proper End Date:

SELECT
        FromCurrency,
        ToCurrency,
        DATE(StartDate,'YYYYMMDD') AS StartDate,
        LEAD(DateAdd(DAY, -1, Date(StartDate, 'YYYYMMDD')),1,'2025-12-31') 
            OVER (PARTITION BY FromCurrency, ToCurrency, RateType 
                    ORDER BY StartDate) as EndDate, 
        Rate    
FROM
    EXCHANGERATES
WHERE RateType = 'MTHEND' AND ToCurrency = 'USD'
ORDER BY FromCurrency, ToCurrency, StartDate;
FROMCURRENCY TOCURRENCY STARTDATE ENDDATE RATE
JPY USD 2018-12-01 2019-03-29 113.4700000
JPY USD 2019-03-30 2019-06-27 0.0090342
JPY USD 2019-06-28 2019-08-01 0.0092721
JPY USD 2019-08-02 2019-08-29 0.0093388
JPY USD 2019-08-30 2019-09-26 0.0093967
JPY USD 2019-09-27 2019-10-31 0.0092729
JPY USD 2019-11-01 2019-11-28 0.0092592
JPY USD 2019-11-29 2019-12-27 0.0091315
JPY USD 2019-12-28 2020-01-31 0.0091174
JPY USD 2020-02-01 2020-02-28 0.0091675

Solution

  • You didn't show your EXCHANGERATES table, but it seems that it has only one date: StartDate (it should have been called EffectiveDate), and it keeps a row per currency pair and date for which a rate is available. In fact the exchange rates change everyday, except on public holidays, and not much is saved by not keeping the rates for the holidays (by copying the rates from the previous day). One would then run their rate conversion query for day-n by simply saying ON ... EXCHANGERATES.StartDate=DayN, and all of the above would be unnecessary.

    IF you don't have any control on the underlying EXCHANGERATE table's population regime then you have to find a way to get the rate for DayN, and if that is not available, DayN-1, and so on. If you know that the only missing rates are for the weekends, you could simply join to this table 3 times, all with LEFT JOIN, first with StartDate=DayN, the second with StartDate.DayN-1, etc.. , and picking up the latest one available.

    If, on the other hand, there are gaps of unpredictable duration, your problems becomes that of a gaps/island problem, and the query you posted is one way of solving it. There are other ways, not necessarily better, look for SQL gaps and Islands problems, consolidating islands/packing.

    I don't know the Snowflake platform, but in SQLServer (or Teradata) this could replace your query:

    SELECT
            FromCurrency,
            ToCurrency,
            RateType,
            Rate,
            StartDate,
            LEAD(DateAdd(day, -1, StartDate),1,'2025-12-31') 
                OVER (partition by FromCurrency, ToCurrency, RateType 
                        ORDER BY by StartDate) as EndDate
    FROM  EXCHANGERATES E
    

    Update 28-Feb-2022; based on my understanding of your data, this should work for you as a replacement for your query:

    SELECT
           FromCurrency,
           ToCurrency,
           DATE(StartDate, 'YYYYMMDD') as StartDate,
           LEAD(DateAdd(day, -1, DATE(StartDate, 'YYYYMMDD')),1,'2025-12-31')
                OVER (PARTITION by FromCurrency, ToCurrency, RateType
                      ORDER BY StartDate) as EndDate,
          Rate
    FROM  EXCHANGERATES E
    WHERE ToCurrency='USD'
      and RateType='MTHEND'
    ORDER BY    1,  2,  3;
    

    Can you please check?

    Update 1-Mar-2022:

    The union subquery EX2 simply finds all date intervals for 'Month End Rates': The 1st Part of the union (with SELECT ... Max(StartDate) AS StartDate, 20251231 AS EndDate) finds the latest StartDate for which a month end rate is available for each combination of From/ToCurrency and calls this valid from StartDate to 2025-12-31, a date in the future. This way, the most recent rate can be used for any date>=max(StartDate)

    It then combines (2nd part of UNION) the older records as follows: for each month end rate in the table (E2), it finds the previous rate in the table (E, E.StartDate<E2.StartDate would give all earlier records, but the MAX(E.StartDate) would give us the latest of them: the previous record. It then subtracts 1 day from the late record (E2.StartDate) and labels it the EndDate, because there is a new rate on E2.StartDate.

    The outer query (EX) then gets the rates themselves, combining them with the intervals derived in EX2.

    For this to work properly, the join condition in the second part of the UNION must specify the same currencies (otherwise we would find a rate for a different currency as the previous record):

            E.StartDate < E2.StartDate
            AND E.RateType = E2.RateType
            AND E.FromCurrency = E2.FromCurrency
            AND E.ToCurrency=E2.ToCurrency
    

    Maybe this explains the difference...