continuous-integration cloud cloud-foundry continuous-deployment concourse

Best practices for continuous deployment with Concourse and Cloud Foundry

So I've been investigating ci/cd pipelines using concourse and cloud foundry lately, and I've been confused about what the best way to do this is. So I've been thinking about how the overall flow would go from development to release. There are a lot of talks and videos that discuss this at a very high level, but often they abstract away too much of the actual implementation details for it to be useful. Like how do people actually roll this out in actual companies? I have a lot of questions, so I will try to list a few of them here in the hope that someone could enlighten me a little.

What does the overall process and pipeline look like conceptually from development to prod? So far I have something along the lines of :
- During development each product team is under their own org, with each developer possibly having their own development "space" that they could manually cf push to and just develop against. There will be development spaces that devs can just directly push to as well as spaces that can only be used by the automated pipeline to deploy artifacts for functional tests.
- Once devs finish a feature they would make a pull request, which would trigger a smaller pipeline with some tests using something like the git-multibranch-resource or the git-pullrequest-resource, which would hook into the github required status check hooks and report back if any particular PRs are able to be merged into master or not
- Once all checks pass and the pull request is merged into master the below pipeline is kicked off, which validates the master branch before releasing the artifact to prod.

code repo [master] -> build -> snapshot artifact repo -> deploy to test space -> run functional tests -> deploy to staging space -> run smoke tests and maybe other regression tests -> deploy artifact to prod -> monitoring/rollbacks (?)

What other things could/should be added to this pipeline or any part of this process?
Once you automate deployment how do you do also automate things like canary releases or rollbacks once something happens? Should this be part of the pipeline or something completely separate?
I've been playing with the idea of creating spaces temporarily and then tearing them down for the functional testing phase, would there be any benefit to doing that? The idea is that the apps being deployed would have their own clean environments to use, but this could also potentially be slow, and it is difficult to know what services are required inside of each space. You would have to read the manifest, which only specifies service-names, which seems to necessitate some sort of canonical way of naming service instances within the same space? The alternative is managing a pool of spaces which also seems complicated...
Should the pipeline generate the manifest files? Or should that be completely up to the developers? Only the developers know which services the app needs, but also it seems like things like instance count, memory etc should be something that the performances tests/pipeline should be able to determine/automate. You could generate a manifest inside the pipeline, but then you would not know which services the app needs without reading a manifest....chicken and egg problem?

I have many more burning questions, but I will cut it off here for now. I know the subjects have kind of bounced back and forth between Concourse and Cloud Foundry, but it seems when discussing CI/CD concepts the nitty gritty implementation details are often the actual tricky bits which tangle the two rather tightly together. I am also aware that the specific implementation details are often very specific to each company, but it would be great if people could talk about how they have implemented these pipelines / automated pipelines using Concourse and Cloud Foundry at their companies (if you can spare the details of course). Thanks everyone!

Solution

During development each product team is under their own org, with each developer possibly having their own development "space" that they could manually cf push to and just develop against. There will be development spaces that devs can just directly push to as well as spaces that can only be used by the automated pipeline to deploy artifacts for functional tests.

Honestly it doesn't matter if you create multiple orgs in your CloudFoundry. If your CI/CD system runs on the same director that is (ab)used by other developers you going to have a hard time probably (i was there).

Once devs finish a feature they would make a pull request, which would trigger a smaller pipeline with some tests using something like the git-multibranch-resource or the git-pullrequest-resource, which would hook into the github required status check hooks and report back if any particular PRs are able to be merged into master or not

We are doing almost the exact thing. For PR's checkout jtarchie's PR resource here https://github.com/jtarchie/github-pullrequest-resource. The only difference is that we are not using Github checks. The problem with them is, that you have to select a set of checks fixed for a branch.

But in case i just changed manifest xyz in the PR, i don't want to run all tests. You can overcome that problem by using the Github Status API only with the pending and successful status.

Once all checks pass and the pull request is merged into master the below pipeline is kicked off, which validates the master branch before releasing the artifact to prod.

We make PR's into the develop branch and following the Git Flow system. Our releases are merged into master manually.

You want to check first which updates you want to carry out before you merge every PR into master and trigger an update of the production system. Your test cases might be good, but you can always miss something.

What other things could/should be added to this pipeline or any part of this process?

You can have a pipeline which updates releases/stemcells in your manifests automatically.

Once you automate deployment how do you do also automate things like canary releases or rollbacks once something happens? Should this be part of the pipeline or something completely separate?

Test your stuff on a staging system before you go to production. Otherwise you a) won't know if the update is happening at zero downtime and b) to prevent a potential problem in production is always better than doing rollbacks.

Ofc you can also create a rollback pipeline, but if you come to that point something else might be wrong with your setup.

Should the pipeline generate the manifest files? Or should that be completely up to the developers? Only the developers know which services the app needs, but also it seems like things like instance count, memory etc should be something that the performances tests/pipeline should be able to determine/automate. You could generate a manifest inside the pipeline, but then you would not know which services the app needs without reading a manifest....chicken and egg problem?

We write our manifests by ourselves and use the CI/CD system to update/deploy/test them.

But if you find a valid case and a concept which lets you apply your manifest generating pipeline for many cases, i would just try it out.

At the very end you have to decide if a certain atomisation holds a business value for your company.

cheers, Dennis