Good Day Team,
We have to extract data from the SQL database tables which are updated (upsert) on an hourly basis and takes approx 15 minutes to update. We have to extract all tables within 1 hour to get the same data as our target table (which uses these source tables). 1 table is too huge and is taking almost 5 hours to extract.
Now my question is suppose I have a table XYZ which takes 5 hours to extract and I have started extracting data at 12 PM, do I get static data of 12 PM even if XYZ has been updated again at 1 PM, 2 PM, 3 PM, 4 PM, 5 PM or we get updated data of 5 PM.
Note: I am using SSIS
Apologies if this is a very basic question, I am not that familiar with SQL know-how.
The data source scans over the table, you would get the record version at time of read.
The time of read could be anytime in that 5 hour window. You could reduce the impact by adding an order clause to your source component so that the latest data gets read and transferred last. This would ensure the data you expect to get the most updates during those 5 hours to catch the most upserts
If the upsert updates a record that has already been copied over in that 5 hour window then you will miss the update
Ultimately you would need to look at a incremental load design if you need to catch every single upsert