Search code examples
apache-sparkkubernetesnfs

Deleting Spark History Server logs which are more than 7 days old from nfs location?


  api.name: spark-history-server
  file.upload.path: x
  gcp.server.property.file.path: x
  git.files.update.path: x
  onprem.server.property.file.path: x
  preferred.id.deployment.file.path: x
  preferred.id.file.path: x
  server.error.whitelabel.enabled: "false"
  server.port: "18080"
  server.property.file.path: x
  server.servlet.context-path: /
  spark.history.fs.cleaner.enabled: "true"
  spark.history.fs.cleaner.interval: "1h"
  spark.history.fs.cleaner.maxAge: "12h"
  spring.thymeleaf.prefix: classpath:/templates/dev/
  spring.thymeleaf.view-names: index,devForm,error
  temp.repo.location: x

I am trying to clear my spark history server logs which I have deployed in Kubernetes using these three parameters as mentioned, I found the answer here Cleaning up Spark history logs

it works when I restart the pods manually and deletes logs older than 12 hours but with time it starts pickingup old logs again and spark history server takes 1-2 hours to restart, is there another way I can do this so I don't have to manually restart the pods with time.

I asked around and found that it may be because I am using a shared starage like nfs.


Solution

  • The problem was that I was trying to add these parameters in Configmap.yaml file instead of Deployment.yaml file. Just add these paramters in SPARK_HISTORY_OPTS.

    Example

    • name: SPARK_HISTORY_OPTS

      value: "-Dspark.history.fs.logDirectory=/FS/YOU/CREATED/ABOVE -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=1d -Dspark.history.fs.cleaner.maxAge=7d"

    This article helped me https://wbassler23.medium.com/spark-history-server-on-dc-os-516fb71523a5