Search code examples
oracle-databaseetlinformaticainformatica-powercenter

Rebuild Informatica persistent cache only when data is updated in the table


I have requirement to build a persistent cache in Informatica for a large table due to performance issue. My requirement is build this cache only when there is some change in the table(database is oracle). In my case table updates data weekly/monthly.

I m trying to figure it out how informtica will understand there is some change in the table and it needs to rebuild the cache now. Is there any elegant approach to achieve this?

One way I can think of is, create a flat file with lookup table data every time lookup table is updated. Use flat file as lookup. Now, this is possible in my case as lookup table is getting loaded through Informatica. What if it is getting updated outside informatica ?

Thank you


Solution

  • enter image description here

    In this example, I am using source table emp_location in the queries, as your sample table. Also the cache files start with emp_location*.

    I created 2 dummy sessions, before your main session.

    1. Create a run table, and save the last session run time in it.

       select * 
       from run_history
       where process_name='EMP_DIM'
      
      +--------------+----------------------+
      | PROCESS_NAME |    LAST_READ_TIME    |
      +--------------+----------------------+
      | EMP_DIM      | 15-MAY-2016 12:00:07 |
      +--------------+----------------------+
      

      In the pre session task of your main session, use something like this.

      update run_history 
      set last_read_time = sysdate
      where process_name='EMP_DIM';
      

    Now find the time, your table was updated. If table doesn't have a update time column, use this to get latest update time of the table.

    select scn_to_timestamp(max(ora_rowscn)) 
    from emp_location;
    

    Now is first dummy session, use this query. It will return 1 row if something was updated in source table after last_read_time. If not, then it will return 0 rows.

    select 1 from dual
    where 
        (select scn_to_timestamp(max(ora_rowscn)) 
          from emp_location)
    >
        (select last_read_time 
           from run_history
           where process_name='EMP_DIM')
    
    1. In the link task, put condition as, $s_check_last_update.TgtSuccessRows=1.

      So next session will only run when there was a real change. In its post_session_task run a command to clear cache files.

      Windows: del $PMCacheDir\emp_location* Unix: rm -rf $PMCacheDir\emp_location*

    2. This link task will have condition like. IIF($s_check_last_update.TgtSuccessRows=1,0,1).

    3. In the main session, openGeneral tab and use Treat Input links as OR.

      Now Integration service will recreate fresh cache files, if they are deleted.

    Alternatively, you can achieve the same thing by a shell/bash script, which will connect to Oracle and check if something was updated in table, after last read time. And if true, it should remove cache files.