Search code examples
performanceselectquery-optimizationabapsap-data-dictionary

Most efficient way to SELECT on a cluster table?


In my program, I need to perform a select on the CDPOS cluster table. However, this process is very inefficient and, depending on the amount of data, it can take several minutes or simply hit a timeout. In a test I conducted, retrieving 35 IDs took 1 minute and 15 seconds, and in a second test with 90 IDs, the program timed out.

Current process: First, I perform a select on the EKKO table with purchase orders filtered by the date provided by the user. Then, I filter the IDs obtained from the EKKO table in the CDHDR table. The remaining IDs from the CDHDR table are then used in the select on the CDPOS table. Finally, I use a check to filter non-key fields. Note: I filter by IDs because the CDPOS table does not have date fields.

The SELECT in which I am having performance issues:

IF lt_chunk_cdhdr is not INITIAL.


    SELECT OBJECTID
           CHANGENR
           FNAME
           CHNGIND
           VALUE_NEW
           VALUE_OLD
      FROM CDPOS
      INTO TABLE lt_temp_cdpos
      FOR ALL ENTRIES IN lt_chunk_cdhdr
      WHERE OBJECTID = lt_chunk_cdhdr-objectid.

    LOOP AT lt_temp_cdpos INTO ls_cdpos.
      CHECK ls_cdpos-fname = 'FRGKE' AND ls_cdpos-value_new = '2'
         OR ls_cdpos-fname = 'MWSKZ'
         OR ls_cdpos-fname = 'KEY' AND ls_cdpos-chngind = 'I'
         OR ls_cdpos-fname = 'BRTWR'
         OR ls_cdpos-fname = 'NETWR'
         OR ls_cdpos-fname = 'ZTERM'
         OR ls_cdpos-fname = 'INCO1'.

      APPEND ls_cdpos TO lt_cdpos.
    ENDLOOP.
ENDIF.

I tried to improve the efficiency of the select using tips I found on the internet. One of these tips was to use only key fields in the WHERE clause when working with cluster tables and then use the CHECK command to filter the retrieved data again. However, even after following this advice, the efficiency is still insufficient. If the number of IDs to be filtered is close to 80, the program does not complete execution.

"Cluster table: the exact opposite occurs: due to the storage method of these tables, the WHERE clause should contain only the key fields, and the other fields should be checked using the CHECK command. The database cannot process cluster tables as it processes transparent tables. Forcing the database to decompress and check fields (in the case of a select with non-key fields in the WHERE clause) is less efficient, in most cases, than qualifying with key fields only and leaving the CHECK for non-key fields after they have been returned." Reference (in Portuguese): https://marcolin.wordpress.com/2010/04/08/grandes-viloes-da-performance-sap/

Another attempt I made was to limit the search period to 31 days. However, depending on the month and the number of purchase orders, this can still cause a timeout.


Solution

  • I managed to resolve the performance issue by adding the OBJECTCLAS field to the filter in addition to OBJECTID. This change drastically reduced the processing time.

    After adding the OBJECTCLAS field, the select looked like this:

    IF lt_chunk_cdhdr is not INITIAL.
    
        SELECT OBJECTID
               CHANGENR
               FNAME
               CHNGIND
               VALUE_NEW
               VALUE_OLD
          FROM CDPOS
          INTO TABLE lt_temp_cdpos
          FOR ALL ENTRIES IN lt_chunk_cdhdr
          WHERE OBJECTID = lt_chunk_cdhdr-objectid
          AND   OBJECTCLAS = lt_chunk_cdhdr-OBJECTCLAS.
    
        LOOP AT lt_temp_cdpos INTO ls_cdpos.
          CHECK ls_cdpos-fname = 'FRGKE' AND ls_cdpos-value_new = '2'
             OR ls_cdpos-fname = 'MWSKZ'
             OR ls_cdpos-fname = 'KEY' AND ls_cdpos-chngind = 'I'
             OR ls_cdpos-fname = 'BRTWR'
             OR ls_cdpos-fname = 'NETWR'
             OR ls_cdpos-fname = 'ZTERM'
             OR ls_cdpos-fname = 'INCO1'.
    
          APPEND ls_cdpos TO lt_cdpos.
        ENDLOOP.
    ENDIF.
    

    In my original approach, I was using only the OBJECTID field to filter the SELECT statement on the CDPOS cluster table. However, by including the OBJECTCLAS field as well, the database was able to optimize the query much more effectively, leading to significant performance improvements.

    Before, when filtering 35 IDs, the program's processing time was around 1 minute and 15 seconds, but now with the change the time was reduced to 3 seconds.

    With this, we can conclude that for optimal performance of a SELECT statement on a cluster table, it is necessary to filter using only key fields in the WHERE clause and to include as many key fields as possible. Additionally, it is not recommended to filter non-key fields in the SELECT statement; instead, use the CHECK command to filter these fields afterwards.