Search code examples
resourcesdatastage

APT_BadAlloc from Join Stage in Data Stage


There is a ETL job dealing with over 43000000 rows and it often fails because of APT_BadAlloc when it process a JOIN stage. Here is the log.

Join_Stage,0: terminate called after throwing an instance of 'APT_BadAlloc'
Issuing abort after 1 warnings logged.
Join_Stage,3: Caught exception from runLocally(): APT_Operator::UnControlledTermination: From: UnControlledTermination via exception...
Join_Stage,3: Caught exception from runLocally(): APT_Operator::UnControlledTermination: From: UnControlledTermination via exception...
Join_Stage,3: The runLocally() of the operator failed.
Join_Stage,3: Operator terminated abnormally: runLocally() did not return APT_StatusOk
Join_Stage,0: Internal Error: (shbuf): iomgr/iomgr.C: 2670 

My question is about the first warning. The event type is warning and message ID is IIS-DSEE-USBP-00002.

Join_Stage,0: terminate called after throwing an instance of 'APT_BadAlloc'

After this warning, the job has failed and it often occurs. However, I couldn't figure out how to fix it. Our team only has the solution for this error is to give at least 10 - 15 minute break time and then the ETL job is restarted. Mostly it is effective way resolving the issue. However, it is not a permanent solution, so I'm googling every day but I can't find out what is my first step to resolve the error and how to do it at all.

I checked out APT_DUMP_SCORE on the administrator. Currently, it set FALSE. BTW, if I set the option TRUE, where and how to read the dump score report? Our server is a linux server and ETL developers are not system admin for the server. Is there an option on Data Stage Designer (client) to see the DUMP SCORE report? I read about the report on the IBM website. https://www.ibm.com/docs/en/iis/11.5?topic=flow-apt-dump-score-report But, I couldn't find the location of the report. Is it provided job log area?

1. Log/View enter image description here

2. APT_DUMP_SCORE options enter image description here

I also saw some options about Buffer size for the system. All the size has the default values. It is very important setting, so I couldn't touch any option here. Please let me know how I can figure out the root cause.

I'm not a system admin. I have to contact someone else who can look into a detailed log file about the biggest row in the dataflow.

3. System Buffer Size settings

enter image description here

FYI. I clicked the Resource Estimation menu against our testing server. But it requires too much resource to perform the estimation, so I couldn't get the estimation through the menu.

4. Resource Estimation menu on Data Stage Designer

enter image description here


Solution

  • The DUMP SCORE will be logged to your job log - as shown in the link you already mentioned. You need to check the log details - double click the entry starting with something like "main_program" - usually within the first 5 entries of the jobrun. This means of course the jobs needs to run after you set the APT_DUMP_SCORE to YES.

    Your main problem seems to be a lack of memory when the job gets executed. Add more memory or ensure that fewer job gets executed in parallel when this job is started an run.