Less frequent deployments makes it more likely you’ll have a critical problem in your deployment

If your company had problems with failed website deployments – due to critical problems in the new version – and decided to “fix” this by having less frequent deployments, then they’ve just increased the chance that future deployments will fail.

How the problem starts

At many companies, the programming team is separate from the team that deploys the application and maintains the network and databases.

Separating these groups adds a layer of security, by limiting the number of people who have access to live data – which is good. However, this often leads to an “us-versus-them” mentality.

The network team’s objective is to keep the production environment stable, and the development team’s objective is to keep changing things.

Eventually, after a few failed deployments, the network team starts to hate updates. So, they make them more difficult to schedule. They add more steps and often require more lead-time for deployments.

How the problem get worse

So, since it’s longer and more difficult to get out an update, the development team starts to do them less frequently.

However, the stream of change requests is generally constant. If management or users want ten changes per month, and you deploy new versions monthly, then each new version has ten changes.

Now that the networking team has added a four or six-week lead-time requirement for each deployment, the logical thing to do is to switch from monthly updates to quarterly updates.

But you still get ten change requests per month. So now, each update have thirty changes.

This is where the problem is born.

How math proves that less frequent deployments leads to more rollbacks

Let’s say you have a 1% chance for each change having a problem that requires a rollback.

That means, if you make one change to your application, there a 99% chance it can stay deployed, and a 1% chance you need to revert to the previous version.

Now, let’s say you made two changes.

To calculate the percentage that this update will be successful, you need to multiply the chance that the first change was successful (.99) with the chance that the second change was successful (also, .99).

The result is .9801 – that’s a 98.01% of a successful update.

So, by having two changes in the update, you decrease the likelihood that the deployment has no critical problems.

If you wait until you’ve made ten changes, you drop to only a 90.44% chance that everything was good, and the update didn’t have a critical failure that required a rollback.

Twenty changes gives you a successful deployment chance of 81.79%.

With thirty changes, you’re down to a 73.97% chance of success.

How can you reduce rollbacks due to critical problem with deployments?

If infrequent deployments makes rollbacks more likely, then the obvious solution is to have deployments more frequently.

But that’s easier said than done.

There are three things to focus on:

Improve automated testing for your applications

To deploy more frequently, you need to test faster.

Part of the delay in deployments can come from the development team – in particular, the testers. If it takes a week to manually test your application, then the maximum frequency you can deploy updates is once a week. That assumes it takes no time for the programmers to actual write the change, and QA doesn’t find any bugs to fix – and go through another week of testing.

Shorten the deployment authorization process

You do need some amount of processing/recording/auditing for each deployment.

However, the odds are good that your existing process has some steps you can eliminate or automate.

This is often the toughest step to do. Each new requirement for deploying updates was probably put into place because a problem happened at one point – and the requirement will “prevent this from ever happening again”.

Count the number of instances that step has actually prevented a problem from happening again. If the problem never occurred again, the step may not be needed. It could be that the development team has put, or can put, something in place in their automated tests to ensure the problem does not re-occur.

Get the development and networking teams focused on the same goal

The goal is to keep your users satisfied.

This almost always requires that you have an application that is both stable and updated.

Users want more features and they want a reliable system. Without both, you’re leaving a huge opening for your competitors.

So get your development and networking/operations teams communicating with each other and working together. When they do encounter a problem with a deployment, both of them should get together to find the root cause and decide on the best way to prevent the same problem from making it to production again.

Use the “5 Whys” to discover the root problem.

And if you have any incentive programs that contradict each other (developers get bonuses for deploying updates, while network admins get bonuses for stability), change them. Everyone should focus on the same goals – customer satisfaction, profit, etc.