Search code examples
amazon-web-servicesamazon-s3hiveamazon-emrpresto

Why are we seeing spikes in our presto query run times?


we're trying to debug why our presto query run times vary significantly over the day. We see several significant spikes, some during working hours and some outside of working hours. We're using EMR version 5.14 and Presto version 0.194. Our data is stored in S3 using parquet files created by Hive. The below graph shows the run times for the same query over time using the Presto CLI. Any ideas/suggestions on what we should focus on or what could potentially cause these spikes will be much appreciated. Thanks!

enter image description here


Solution

  • Posting this in case anyone else has this issue. We ended up disabling hive statistics in hive.properties and that improved performance.