Search code examples
kubernetesgoogle-kubernetes-enginekubernetes-helmkubernetes-pvc

ReadWriteMany volumes on kubernetes with terabytes of data


We want to deploy a k8s cluster which will run ~100 IO-heavy pods at the same time. They should all be able to access the same volume.

What we tried so far:

  • CephFS
    • was very complicated to set up. Hard to troubleshoot. In the end, it crashed a lot and the cause was not entirely clear.
  • Helm NFS Server Provisioner
    • runs pretty well, but when IO peaks a single replica is not enough. We could not get multiple replicas to work at all.
  • MinIO
    • is a great tool to create storage buckets in k8s. But our operations require fs mounting. That is theoretically possible with s3fs, but since we run ~100 pods, we would need to run 100 s3fs sidecars additionally. Thats seems like a bad idea.

There has to be some way to get 2TB of data mounted in a GKE cluster with relatively high availability?

Firestorage seems to work, but it's a magnitude more expensive than other solutions, and with a lot of IO operations it quickly becomes infeasible.


I contemplated creating this question on server fault, but the k8s community is a lot smaller than SO's.


Solution

  • I think I have a definitive answer as of Jan 2020, at least for our usecase:

    | Solution        | Complexity | Performance | Cost           |
    |-----------------|------------|-------------|----------------|
    | NFS             | Low        | Low         | Low            |
    | Cloud Filestore | Low        | Mediocre?   | Per Read/Write |
    | CephFS          | High*      | High        | Low            |
    
    * You need to add an additional step for GKE: Change the base image to ubuntu
    

    I haven't benchmarked Filestore myself, but I'll just go with stringy05's response: others have trouble getting really good throughput from it

    Ceph could be a lot easier if it was supported by Helm.