Search code examples
pythonapache-sparkgarbage-collectionjvmazure-databricks

Steps to reduce time delay due to GC allocation failure in azure databricks


I'm executing a print "Hello World" Job in azure databricks python notebook on a spark cluster. Every time the Job is run it takes more than 12 seconds to execute which is expected to take less than 12 secs as it's the simplest python code anyone can think of. When I verify the logs it shows GC Allocation failure as follows:

2019-02-15T15:47:27.551+0000: [GC (Allocation Failure) [PSYoungGen: 312512K->57563K(390144K)] 498744K->243803K(1409024K), 0.0153696 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 
2019-02-15T15:47:28.703+0000: [GC (Metadata GC Threshold) [PSYoungGen: 206668K->65267K(385024K)] 392909K->251515K(1403904K), 0.0187692 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 
2019-02-15T15:47:28.722+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 65267K->0K(385024K)] [ParOldGen: 186248K->244119K(1018880K)] 251515K->244119K(1403904K), [Metaspace: 110436K->110307K(1144832K)], 0.3198827 secs] [Times: user=0.64 sys=0.04, real=0.32 secs] 

Wanted to know is the Job delay > 12 secs due to GC allocation failure? If yes, how can I reduce it? If not, what can be the other reason for the delay and how to correct it?


Solution

  • There is an overhead of starting a Spark Job on a cluster. If processing petabytes then the overhead is small but here it is noticable. The GC is not an issue here.