Search code examples
springspring-cloud-dataflowspring-cloud-task

How to implement a multi-tenant database for spring cloud data flow


We would like to implement a multi-tenant solution for SCDF for which each tenant may have unique task definitions / etc. Ideally we only want a single SCDF server (as opposed to setting up an SCDF server for each tenant), as pictured: multi-tenant SCDF

Is this possible or is the only way to achieve isolation of the data between tenants to have separate data flow server instances?


Solution

  • What you're attempting here is not possible today. You'd have to provision SCDF for each tenant. In cloud platforms like Kubernetes or Cloud Foundry, it is recommended because you can access-control the tenants through "namespace" and "org/space" isolation respectively. On this foundation, the platforms provide a more robust separation through RBAC assignments for each user in the Tenant.

    A little bit of more background as to why we do this today. SCDF and the Task/Job repositories are coupled in the sense that the Dashboard and the other client tools interact with the same datasource to provide the consistent UX to monitor and manage the data pipelines centrally. With the recent multi-platform backends support for Tasks, you're still expected to use a common datasource in the current design.

    All that said, we are looking into improving to allow users to have a database with schemas prefixed with an identifier [see: spring-cloud/spring-cloud-dataflow#2048]. With that in place, it would be possible to then filter by the identifier-specific task/job executions and likewise track them as isolated units of operations within the single SCDF instance.

    However, it may not scale for cloud deployments. Each of the tenant isolation boundaries, for instance, a "namespace" in Kubernetes needs to have enough resources (cpu/memory/disk) to handle "multiple" tenant deployments of task/batch apps. If you don't autoscale the resource capacity, you'd have deployment failures.

    Maybe you could help with describing your requirements in some more detail, so we could relate to why this could still be useful. Please also share how you're going to design the resource allocations in the underlying deployment platform - feel free to comment in #2048.