This article represents my current thoughts around building and deploying software (circa 2023). It is a general straw man. As usual in software development, the caveats apply that there is no one size fits all silver bullet pattern. This is where I start as a “given no further constraints” approach. It is highly likely to be too complex for a single-dev piece of software and is equally as likely not to be suitable for large monolithic applications either. However, for software somewhere in between those two it may fit your needs as it does mine.


My definition of CICD is continuous static analysis, build, test, publish, deploy, verification and releasing to customers, for every change. While CI stands for Continuous Integration, CD ambiguously stands for either Continuous Delivery or Continuous Deployment, where Delivery and Deployment are equally ambiguous and have debatable definitions. So, while I will use the acronym here, see my definition above for what that means to me.

The crux of CICD is: always shipping, always automatically.

This unintuitively implies having automated gates that can stop and halt these deployments, and that somewhere right at the beginning the automation is kicked off by an initial interaction or contribution (such as a file change in a code repository). Also, do have a rollback plan handy.

The benefits we are trying to achieve are faster feedback loops, and changing mindset to help find and fix issues sooner, much closer to where the associated change was thought through and implemented. This is often referred to as Shift Left. In other cases, it allows us to reduce the time between a production issue being discovered, and the last time that component was being worked on by a human (which could be the difference between months in between, and hours/minutes in between thanks to CICD).

A brief digression into testing

Testing could be its own series of blog posts, so we will gloss over it slightly here. But it is necessary to say that automated testing is a large part of doing CICD well. And the ability to perform enough manual testing when required needs to be accounted for. This is how we achieve our safety and trust to enable straight-shot pipelines all the way to being in front of customers. We can look at a high-level view of these briefly.

Automated Testing

Automated testing should cover as much of the application as possible. This can range across unit tests, property-based tests, composition tests, mutation tests, boundary tests, snapshot tests, integration tests, acceptance tests, specification tests, regression tests, end-to-end tests, UI tests, smoke tests and any other kinds of tests you can think of to have run automated before code can go to production.

A new project might aim for a minimal subset of the above (such as unit, integration, and smoke) while a more established piece of software may have most or all of these in place.

The main goal in terms of CICD is to ensure most of your testing runs before merging in changes to a release candidate, and enough validation testing is run during and after deployment to ensure fidelity.

Manual Testing

Manual testing requires enough real dependencies to be meaningful, or else you may as well just automate any task that you are testing using fakes or mocks. Largely this is exploratory testing or deep testing of something complex requiring a human eye for validation.

The two tools I would lean on here are having a feature flagging and dark release pattern in your development process and having a secondary manual test environment to deploy pre-release changes to.

Feature flagging allows you to deploy to preproduction and production with inert functionality (sometimes called a shadow release). You can then enable this functionality either isolated to a canary server instance, per environment (e.g. preproduction only) or per user in a way that enables testers or beta users to interact with the flagged changes before a wider release.

Having an additional pre-production environment you can deploy changes to will enable Pull Requests or feature branches to be deployed there and manually tested before they are merged into a production candidate release.

Reliability Engineering

One aspect that makes this all come together and work is the developer thinking through production, and how they will know and trust it is working. Ongoing monitoring and alerting in place is a key part of this strategy. Without monitoring and alerting detecting and escalating issues you are effectively in chaos.

Automated tests catch the known failure scenarios, making sure they do not occur, or regress. But there are unknown failure scenarios, as well. We can detect these using signals such as error log count, 400 or 500 HTTP status codes, or request/response time spikes. These don’t tell us what the problem is but are great at signalling we do have a problem.

The feedback loop of monitoring tools alerting the developer who wrote the code (or their on-call proxy who can reach out to them) is what gives the confidence to release changes straight to production, complementing the other testing guards in place that catch the known issues before they reach customers so we can also catch these unknown issues after they reach customers.

I’ve left automated rollback out of this pipeline currently, as it adds additional challenges in understanding what is safe and when it isn’t. However, this is the next step you would consider as you delved into the CICD space. Having a manual plan, though, is essential. (I often use roll-forward to roll back. Either with a re-deploy of an older version, or with a patch+merge+release - often both together.)

A side note on the shape of software

What is it that we are testing and deploying? My philosophy is to include everything that makes up this component. By which I mean the application, any infrastructure (using configuration-as-code) and any side-car utilities (such as queues, caches, stores etc) not shared with other components.

It follows that all our static verification and testing also covers these infra and sidecar deployments as well.

If you do have queues, stores, or caches that are shared, you will want to extract these as their own components and deployments separately. But I’ve found where these are coupled and self-contained to a particular application, then having them co-located in a repository and deployed in the same pipeline is a simplification worth having.

Also note that your configuration-as-code for anything in the permanent storage capacity has appropriate backups, as well as the right checks and balances to restrict the pipeline from doing anything destructive automatically.

If feasible, you might also leverage configuration as code for your monitoring and alerting tools in the same way through this pipeline.

The pipeline

CICD is a pipeline, and these are the key phases and steps in that pipeline in my ideal model:

P u l l R e q u e s t M e r g e T r u n k b u i l d P r e - P r o d D e p l o y m e n t P r e P r o d V e r i f y P r o d D e p l o y m e n t P r o d V e r i f y R e l e a s e N o t i f i c a t i o n

We will break this into a few chunks for a deeper discussion.

Pre merge

Ideally, I want trunk (e.g. main branch) to always be production capable. Achieving this means ensuring the quality and fidelity of any changes before they are merged in. This may look like a Pull Request with automated pre-build checks (e.g. GitHub Actions) and code reviews. It may also look like git commit hooks that are run and confirmed ahead of a merge. Exactly what you implement here is dependent on the software in question, and the engineers working on it.

For me, this encompasses linting and other static analysis, security scanning, compiling, and executing all tests, and producing candidate artifacts that can also be verified and validated here too.

Ideally, for the fasted feedback, you want this to take as little time as possible. If this takes more than 10 minutes (ideally 5 or less) you will want to consider reducing scope, performance tuning, or potentially splitting something monolithic up somehow.

Trunk Build

At all times, you want to be able to validate trunk against the same lint, build and test as you would for any candidate commit. Ensuring you can run the same checks in both places maintains the validity of your trunk.

In addition, we would also take the generated artifacts and make them available to the further deployment processes coming up next in the pipeline. So our trunk build also publishes artifacts somewhere.

Preproduction deploy and verify

It is important to have an environment that is production-like but isn’t serving customers to first test your artifacts in. Despite your best efforts of testing, some bugs and misconfigurations can’t be spotted until you run a real deployment, and watch the software interact with real dependencies (or at least their preproduction equivalents).

Before deploying to production proper you will want to run through deployment and post-deployment verification testing in such an environment to provide extra automated confidence before reaching real customers.

This post-deployment verification is also your last chance to have automated checks catch issues before they get released out to real customers.

In some situations, this may be a canary or an early-adopters group, though ideally you want something without real customers to do this testing on, and instead use feature flags for beta user test groups, and any A/B testing.

Production Deploy and Verify

It is now time to deploy to production. You’ve satisfied your tests and verified that it can work in a deployed environment after deploying, mitigating any issues in the production deployment process (which assumes your pre-prod and prod deployment processes are near-identical).

We repeat our deployment and verify steps once more only this time against our customer-facing production environment. Our change has now reached production.

If you do find production issues, you will want to go back and add regression checks to the testing at the appropriate point in the pipeline to make sure the same issue halts deployments in the future. This is one of the feedback cycles you need to adhere to when taking on CICD in this way, to strengthen the reliability and trust in the process.


Once you have a deployed and verified production deployment, you’ll want to make sure anyone (or any system) that wants to know about it can do so.

Specifically, you want to have release notes publish, to have any APM systems notified of the release, Internal Comms published on something like Slack or Discord, as well as any documentation related to the changes updated and published.

Ideally, all of this is automated and automatic. For documentation and release notes, having a way to derive these from the source itself, or from the Source Control revision history or commit messages, or Pull Request titles etc. This saves a bunch of manual busy work which would otherwise distract a human on every deployment.

The same goes for using APIs to post to Slack or other chat systems to notify other internal or external stakeholders of the change. A great way to identify the cause of a system outage can be to know which of your dependencies have made releases around the time an outage was introduced. Be nice to your API consumers.

Wrap Up

The above largely outlines a strawman of my MVP. Ideally, I would go through and set up at least the skeleton of this for any new service where I expect it to live in production for any period of time. (If I don’t think this code will be productionised and is merely a toy or demo, such overhead isn’t worth investing in). This would largely be even before any functionality is implemented (e.g. deploying the new empty project).

Having the Pipeline in place, we continue to extend the testing to improve confidence in the deployment process, and this goes hand-in-hand with building out the functionality.

Iteratively and continuously build and ship the software to meet the requirements of the project of work.