api.name: spark-history-server
file.upload.path: x
gcp.server.property.file.path: x
git.files.update.path: x
onprem.server.property.file.path: x
preferred.id.deployment.file.path: x
preferred.id.file.path: x
server.error.whitelabel.enabled: "false"
server.port: "18080"
server.property.file.path: x
server.servlet.context-path: /
spark.history.fs.cleaner.enabled: "true"
spark.history.fs.cleaner.interval: "1h"
spark.history.fs.cleaner.maxAge: "12h"
spring.thymeleaf.prefix: classpath:/templates/dev/
spring.thymeleaf.view-names: index,devForm,error
temp.repo.location: x
I am trying to clear my spark history server logs which I have deployed in Kubernetes using these three parameters as mentioned, I found the answer here Cleaning up Spark history logs
it works when I restart the pods manually and deletes logs older than 12 hours but with time it starts pickingup old logs again and spark history server takes 1-2 hours to restart, is there another way I can do this so I don't have to manually restart the pods with time.
I asked around and found that it may be because I am using a shared starage like nfs.
The problem was that I was trying to add these parameters in Configmap.yaml file instead of Deployment.yaml file. Just add these paramters in SPARK_HISTORY_OPTS.
name: SPARK_HISTORY_OPTS
value: "-Dspark.history.fs.logDirectory=/FS/YOU/CREATED/ABOVE -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=1d -Dspark.history.fs.cleaner.maxAge=7d"
This article helped me https://wbassler23.medium.com/spark-history-server-on-dc-os-516fb71523a5