Search code examples
etldata-warehousessis-2012warehouse

SQL Server Integration Services - Out of memory exception


I've created an ETL that has grown to populate about 250 tables (Staging Tables, Dimension Tables and Fact Tables).

I got the ETL design pattern from Stacia Meisner, her ETL design pattern was based upon creating a template package for loading a staging table, loading a dimension table and then for loading a fact table. The idea is to use variables that you set in a specific package which then call the appropriate stored procedures, create lineage and auditing data, populate the correct tables etc using expressions, so that you just copy and paste the template package in your solution, edit the variable and as long as you have the stored procedures in place to source the data and the correct table names, everything works perfectly.

That is... up until I reached 250 tables. When I run the ETL in BIDS, it consumes RAM like crazy. When I deploy the ETL and execute it in SQL, it doesn't. One ETL Run on my laptop is probably going to consume about 3 to 4 gigabytes of RAM as it opens every child package of mine from a parent package. There are now 250 packages in my solution.

I can up the RAM in my laptop, (currently sitting at 8GB or RAM), but there are definitely warning alarms sounding in my head making me think that perhaps 250 data flow tasks would have been a better choice.

Understanding the flaw in this design pattern now, I suppose then my questions are as follows

  1. Was BIDS ever meant to have so many packages executing within an ETL?
  2. Is there any way I can reduce the consumption of RAM when I run the ETL within the IDE?
  3. Is the consumption of RAM to be expected, and if so how do developers normally deal with it. I could easily get around it by never running the entire ETL within my IDE, but test it in it's parts and then deploy it in it's entirety
  4. Should I step away from the 1 package per table design pattern and implement data flow tasks in 3 packages (1 for loading the staging tables, 1 for loading the dimensions and 1 for loading my fact tables)

Thanks for your time, I'd appreciate input you have.

Regards,

Jesse


Solution

  • Stepping away from the RAM question for a while, I would keep the design pattern. It's incredibly valuable when you run into a situation where you need to run just one table's ETL post-deployment. With 2012+ it also gives you much more useful logging information, as by using parent packages it's all considered part of one run, allowing you to build useful monitoring/reporting from the data held in SSISDB. All of the other reasons you've listed for adopting this pattern in the first place are also valid; the pattern does promote reuse & standardisation, and reduces development time. It also makes it very easy for another developer to pick up the solution, find their way around it, support it, make changes to it, etc.

    Going back to the RAM: 8 gigs genuinely isn't a huge amount for an SSIS development machine in an environment where your ETL solution is that big - if you have the option to upgrade, I think that would be a good idea. I've not run into the issue you're describing despite working with some pretty big ETL solutions, but then my previous two development machines have had 32GB and 16GB of RAM. The SSIS IDE is far from perfect and I can entirely believe what you're describing being an issue, though.

    It must also be noted that I don't often run an entire ETL solution within the IDE. I more often run independent parts as I'm working on them, then as soon as I know that part is working, I deploy to a development environment (whether that's a local installation or a server), and do my full run via the agent. Given the differences in how things run via the agent vs. inside the IDE, I find it useful to deploy early. I also appreciate the logging information that running via the agent gives me - it can help me track whether changes I'm making have affected the performance. With the 2012+ deployment model it also isn't time consuming to work in this way.

    Ultimately I think it would be a mistake to move away from a solid pattern with many benefits rather than changing the way you work slightly in order to cope with the IDE's imperfections. I think you may have largely answered your own question in point 3.

    A final note: If you have a local development DB on your laptop (rather than only running SSIS locally, and the DB being on a server), make sure you have a fairly low maximum RAM setting for your local SQL Server instance. If you don't, it will use all of your RAM to cache things, and then SQL Server and SSIS will end up fighting for RAM. I have seen that cause memory errors in the SSIS IDE. I don't think this is what you're describing here, but I'm mentioning it just in case this helps you, or someone else who finds this Q&A.