Search code examples
sqlhanahana-graphical-calculation-view

Fill in gap with prior record value having a populated quantity LIMIT: no analytics can be used


Assume data with structure like this: Demo

WITH CAL AS(
SELECT 2022 YR, '01' PERIOD UNION ALL
SELECT 2022 YR, '02' PERIOD UNION ALL
SELECT 2022 YR, '03' PERIOD UNION ALL
SELECT 2022 YR, '04' PERIOD UNION ALL
SELECT 2022 YR, '05' PERIOD UNION ALL
SELECT 2022 YR, '06' PERIOD UNION ALL
SELECT 2022 YR, '07' PERIOD UNION ALL
SELECT 2022 YR, '08' PERIOD UNION ALL
SELECT 2022 YR, '09' PERIOD UNION ALL
SELECT 2022 YR, '10' PERIOD UNION ALL
SELECT 2022 YR, '11' PERIOD UNION ALL
SELECT 2022 YR, '12' PERIOD ),
Data AS (
SELECT 2022 YR, '01' PERIOD, 10 qty UNION ALL
SELECT 2022 YR, '02' PERIOD, 5 qty UNION ALL
SELECT 2022 YR, '04' PERIOD, 10 qty UNION ALL
SELECT 2022 YR, '05' PERIOD, 7 qty UNION ALL
SELECT 2022 YR, '09' PERIOD, 1 qty)

SELECT * 
FROM CAL A
LEFT JOIN data B
 on A.YR = B.YR
 and A.Period = B.Period
WHERE A.Period <10 and A.YR = 2022
ORDER by A.period

Giving us:

+------+--------+------+--------+-----+
|  YR  | PERIOD |  YR  | PERIOD | qty |
+------+--------+------+--------+-----+
| 2022 |     01 | 2022 |     01 |  10 |
| 2022 |     02 | 2022 |     02 |   5 |
| 2022 |     03 |      |        |     |
| 2022 |     04 | 2022 |     04 |  10 |
| 2022 |     05 | 2022 |     05 |   7 |
| 2022 |     06 |      |        |     |
| 2022 |     07 |      |        |     |
| 2022 |     08 |      |        |     |
| 2022 |     09 | 2022 |     09 |   1 |
+------+--------+------+--------+-----+

With Expected result of:

+------+--------+------+--------+-----+
|  YR  | PERIOD |  YR  | PERIOD | qty |
+------+--------+------+--------+-----+
| 2022 |     01 | 2022 |     01 |  10 |
| 2022 |     02 | 2022 |     02 |   5 |
| 2022 |     03 | 2022 |     03 |   5 | -- SQL derives
| 2022 |     04 | 2022 |     04 |  10 |
| 2022 |     05 | 2022 |     05 |   7 | 
| 2022 |     06 | 2022 |     06 |   7 | -- SQL derives
| 2022 |     07 | 2022 |     07 |   7 | -- SQL derives
| 2022 |     08 | 2022 |     08 |   7 | -- SQL derives
| 2022 |     09 | 2022 |     09 |   1 |
+------+--------+------+--------+-----+

QUESTION: How would one go about filling in the gaps in period 03, 06, 07, 08 with a record quantity referencing the nearest earlier period/year. Note example is limited to a year, but gap could be on period 01 of 2022 and we would need to return 2021 period 12 quantity if populated or keep going back until quantity is found, or no such record exists.

LIMITS:

  • I am unable to use table value functions. (No lateral, no Cross Apply)
  • I'm unable to use analytics (no lead/lag)
  • correlated subqueries are iffy.

Why the limits? this must be done in a HANA graphical calculation view. Which supports neither of those concepts. I've not done enough to know how to do a correlated subquery at this time to know if it's possible.

  • I can create any number of inline views or materialized datasets needed.

STATISTICS:

  • this table has over a million rows and grows at a rate of productlocationperiodsyears. so if you have 100020126=1.4 mil+ in 6 years with just 20 locations and 1000 products...
  • each product inventory may be recorded at at the end of a month for a given location. (no activity for product/location, no record hence a gap. Silly mainframe save storage technique used in a RDBMS... I mean how do I know the system just didn't error on inserting the record for that material; or omit it for some reason... )
  • In the cases where it is not recorded, we need to fill in the gap. The example provided is broken down to the bear bones without location and material as I do not believe it is not salient to a solution.

ISSUE:

  • I'll need to convert the SQL to a "HANA Graphical calculation view"
  • Yes, I know I could create a SQL Script to do this. This is not allowed.
  • Yes, I know I could create a table function to do this. This is not allowed.
  • This must be accomplished though Graphical calculation view which supports basic SQL functions
  • BASIC Joins (INNER, OUTER, FULL OUTER, Cross), filters, aggregation, a basic rank at a significant performance impact if all records are evaluated. (few other things) but not window functions, not cross Join, lateral...
  • as to why it has to do with maintenance and staffing. The staffed area is a reporting area who uses tools to create views used in universes. The area wishes to keep all Scripts out of use to keep cost for employees lower as SQL knowledge wouldn’t be required for future staff positions, though it helps!

For those familiar this issue is sourced from MBEWH table in an ECC implementation


Solution

  • This can be done with graphical calculation views in SAP HANA.

    It's not pretty and probably not very efficient, though. Whether or not the persons that are supposedly able to maintain graphical calc. views but not SQL statement will be able to successfully maintain this is rather questionable.

    First, the approach in SQL, so that the approach becomes clear:

    create column table calendar
    ( yr integer
     , period nvarchar (2) 
     , primary key (yr, period))
     
     
     insert into calendar 
     ( select year (generated_period_start) as yr
            , ABAP_NUMC( month(generated_period_start), 2) as period 
       from series_generate_date ('INTERVAL 1 MONTH', '2022-01-01', '2023-01-01'));
     
    
    create column table data
    ( yr integer
     , period nvarchar (2) 
     , qty integer
     , primary key (yr, period));
     
    insert into data values (2022, '01', 10);
    insert into data values (2022, '02', 5);
    insert into data values (2022, '04', 10);
    insert into data values (2022, '05', 7);
    insert into data values (2022, '09', 1);
     
     
    SELECT * 
    FROM CALendar A
    LEFT JOIN data B
     on A.YR = B.YR
     and A.Period = B.Period
     WHERE A.Period <'10' and A.YR =2022
    ORDER BY  A.period;
    
    /*
    YR      PERIOD  YR      PERIOD  QTY
    2,022   01      2,022   01      10 
    2,022   02      2,022   02      5  
    2,022   03      ?       ?       ?  
    2,022   04      2,022   04      10 
    2,022   05      2,022   05      7  
    2,022   06      ?       ?       ?  
    2,022   07      ?       ?       ?  
    2,022   08      ?       ?       ?  
    2,022   09      2,022   09      1  
    */
    

    The NUMC() function creates ABAP NUMC strings (with leading zeroes) from integers. Other than this it's pretty much the tables from OP.

    The general approach is to use the CALENDAR table as the main driving table that establishes for which dates/periods there will be output rows.

    This is outer joined with the DATA table, leaving "missing" rows with NULL in the corresponding columns.

    Next, the DATA table is joined again, this time with YEAR||PERIOD combinations that are strictly smaller then the YEAR||PERIOD from the CALENDAR table. This gives us rows for all the previous records in DATA.

    Next, we need to pick which of the previous rows we want to look at. This is done via the ROWNUM() function and a filter to the first record. As graphical calculation views don't support ROWNUM() this can be exchanged with RANK() - this works as long as there are no two actual DATA records for the same YEAR||PERIOD combination.

    Finally, in the projection we use COALESCE to switch between the actual information available in DATA and - if that is NULL - the previous period information.

    /*
    CAL_YR  CAL_PER COALESCE(DAT_YR,PREV_YR)    COALESCE(DAT_PER,PREV_PER)  COALESCE(DAT_QTY,PREV_QTY)
    2,022   01      2,022                       01                          10                        
    2,022   02      2,022                       02                          5                         
    2,022   03      2,022                       02                          5                         
    2,022   04      2,022                       04                          10                        
    2,022   05      2,022                       05                          7                         
    2,022   06      2,022                       05                          7                         
    2,022   07      2,022                       05                          7                         
    2,022   08      2,022                       05                          7                         
    2,022   09      2,022                       09                          1                         
    */
    

    So far, so good.

    The graphical calc. view for that looks like this:

    graphical calc view FILLUP

    As it's cumbersome to screenshoot every single node, I will include the just most important ones:

    1. CAL_DAT_PREV

    CAL_DAT_PREV node

    Since only equality joins are supported in graphical calc. views we have to emulate the "larger than" join. For that, I created to calculated/constant columns join_const with the same value (integer 1 in this case) and joined on those.

    2. PREVS_ARE_OLDER

    PREVS_ARE_OLDER node

    This is the second part of the emulated "larger than" join: this projection simply filters out the records where cal_yr_per is larger or equal than prev_yr_per. Equal values must be allowed here, since we don't want to loose records for which there is no smaller YEAR||PERIOD combination. Alternatively, one could insert an intial record into the DATA table, that is guranteed to be smaller than all other entries, e.g. YEAR= 0001 and PERIOD=00 or something similar. If you're familiar with SAP application tables, then you've seen this approach.

    By the way - for convenience reasons, I created calculated columns that combine the YEAR and PERIOD for the different tables - cal_yr_per, dat_yr_per, and prev_yr_per.

    3. RANK_1

    RANK_1 node

    Here the rank is created for PREV_YR_PR, picking the first one only, and starting a new group for every new value fo cal_yr_per. This value is returned via Rank_Column.

    4. REDUCE_PREV

    REDUCE_PREV node

    The final piece of the puzzle: using a filter on Rank_Column = 1 we ensure to only get one "previous" row for every "calendar" row.

    Also: by means of IF(ISNULL(...), ... , ...) we emulate COALESCE(...) in three calculated columns, aptly named FILL....

    And that's the nuts and bolts of this solution.

    "It's works on my computer!" is probably the best I can say about it.

    SELECT  "CAL_YR", "CAL_PERIOD"
          , "DAT_YR", "DAT_PER", "DAT_QTY"
          , "FILL_YR", "FILL_QTY", "FILL_PER" 
    FROM "_SYS_BIC"."scratch/QTY_FILLUP"
    ORDER BY "CAL_YR" asc, "CAL_PERIOD" asc;
    
    /*
    CAL_YR  CAL_PERIOD  DAT_YR  DAT_PER DAT_QTY FILL_YR FILL_QTY    FILL_PER
    2,022   01          2,022   01      10      2,022   10          01      
    2,022   02          2,022   02      5       2,022   5           02      
    2,022   03          ?       ?       ?       2,022   5           02      
    2,022   04          2,022   04      10      2,022   10          04      
    2,022   05          2,022   05      7       2,022   7           05      
    2,022   06          ?       ?       ?       2,022   7           05      
    2,022   07          ?       ?       ?       2,022   7           05      
    2,022   08          ?       ?       ?       2,022   7           05      
    2,022   09          2,022   09      1       2,022   1           09      
    2,022   10          ?       ?       ?       2,022   1           09      
    2,022   11          ?       ?       ?       2,022   1           09      
    2,022   12          ?       ?       ?       2,022   1           09      
    */