Search code examples
ubuntudockerkubernetesgoogle-cloud-platformgoogle-kubernetes-engine

Kubernetes pod running out of memory when writing to a volume mount


I have a Kubernetes cluster that takes jobs for processing. These jobs are defined as follows:

apiVersion: batch/v1
kind: Job
metadata:
  name: process-item-014
  labels:
    jobgroup: JOB_XXX
spec:
  template:
    metadata:
      name: JOB_XXX
      labels:
        jobgroup: JOB_XXX
    spec:
      restartPolicy: OnFailure
      containers:
      - name: worker
        image: gcr.io/.../worker
        volumeMounts: 
        - mountPath: /workspace
          name: workspace
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 512Mi  
      volumes: 
        - name: workspace 
          hostPath:
            path: /tmp/client-workspace

Note that I'm trying to mount a folder in the host into the container (workspace). Note also the memory limits defined. On my container, I download a number of files into workspace, some of them being pretty large (They are downloaded with gsutil from GCS, but don't think that's too important).

When the files I download exceed the memory limits, my code breaks with a "device out of space" error. This doesn't completely make sense, because I'm storing the files into a mount, that is backed by the host's storage, which is more than enough. It's also mentioned in the docs that memory limit's the ammount of RAM available for the container, not storage. Still, when I set the limit to XGi, it breaks after XGi download pretty consistently.

My container is based on ubuntu:14.04, running a shell script with a line like this:

gsutil -m cp -r gs://some/cloud/location/*  /workspace/files

What am I doing wrong? Will definitely need to have some limits for my containers, so I can't just drop the limits.


Solution

  • The /tmp filesystem is often backed by tmpfs, which stores files in memory rather than on disk. My guess is that is the case on your nodes, and the memory is being correctly charged to the container. Can you use an emptydir volume instead?