Serializing CircleCI

• 3 min read
Hunter Fernandes

Hunter Fernandes

Software Engineer


We extensively use CircleCI as our CI provider. We run our entire test suite on CircleCI machines for all new commits in all PRs across our microservice repositories.

image of circleci logo

Like most other CI providers, they integrate with Github’s webhooks so that CircleCI gets pinged to start running CI tests whenever new code is pushed. This is 80% of our CI workload, and this 80% works great. Let’s talk about the other 20%.

Job Serialization

Our other major use case is running our entire end-to-end test suite against our pre-prod staging cluster. This happens after any microservice merges a PR.

We only want one deployment to run against our shadow cluster at a time. We need deployments to be serialized because:

  1. We need to not deploy new versions of apps over eachother. Some changes migrate state, and running duplicate migrations (or varied migrations but at the same time) is a bad idea.
  2. Deploying one thing at a time makes it very easy to tell which change caused breakage.

The problem is that CircleCI has no native method of serializing runs. Instead, their recommended advice is to use this script to spin, waiting for prior builds to be completed.1

The net effect of this is that you have many builds running and spending money, and only the first one to start is actually doing anything meaningful. The others are just waiting their turn.

For example, here’s a particularly bad case where we wasted 48 minutes waiting for previous jobs to be done:

You can see that, just like the recommended script linked earlier, we just have to spin and wait.

This is one of my biggest complaints about CircleCI. All projects that need to serialize deployments (which is… almost all of them?) also need to wastefully spend resources in wait loops.

If I were a little more conspiratorial, I might think that CircleCI is incentivized not to implement serialization properly because all the credits spent on spin-locking would go away.

The Damage

I dug through the last 30 days of pre-prod test pipeline data to figure out how much time and money we waste on this.

74% of the Deployment time and 5% of total pipeline time is spent waiting for serialization. This is money being thrown down the drain because CircleCI does not have proper serialization support.

You could probably hack together a solution with API Gateway, webhooks, and CircleCI API triggers. But then you would lose pipeline progress visibility. The last thing you want is for your engineers to wonder why their build isn’t on the CircleCI dashboard.

CircleCI’s backend is the best place to start jobs serially. My top CircleCI wish is to add serialization support to circleci.yml configuration files. It’s been over a decade since its launch. How much longer will it be until we get this basic functionality?

Footnotes

  1. Note how they locked that thread to prevent discussion.