I’ve seen some things floating around the internet recently about the fact that DevOps encourages redundancy of engineers/QA/etc. through automation.  “It’s all about automation” is one of the mantras frequently chanted about DevOps, and as an Operations Engineer who’s been using Automation  to improve the way he works for the last eight years, it’s also a mantra that I fundamentally disagree with.  Automation is *part* of DevOps, but it’s certainly not the whole and it shouldn’t be used as a magic wand to allow managers to cite “devops” as a means to cutting staffing levels.

This lead me to tweet this:

and last night, I received a reply to that tweet from Stratoscale

“Challenge Accepted”, was my first thought.  I work for a DevOps Consultancy and a quote brought back to the office my a colleague recently was fresh in my mind:

“If you have a room full of ten people and ask them to define DevOps, you’ll get at least eleven different answers!”

So, let’s take a look at the seven “requirements” listed on the webpage linked in the tweet and see how they match up to what I believe DevOps means.

Before I get started, I have two confessions to make:

  1. I didn’t download the white-paper, so this is purely based on the list on that page.  Once I’ve had a chance to read the white-paper, I’ll write some more on this if I feel it’s needed.
  2. I don’t understand Baseball.  I understand Cricket (and England isn’t doing too well in that as I type this!), and I am led to believe that the rules are just as complex for both games, however, If I get confused between Hitters (is that the right name?!) and Batsmen, or Pitchers (I’m sure that’s right!) and Bowlers, you’ll have to forgive me.  Two nations divided by a common language and all that…
  3. I am an Operations Engineer who has also worked as a developer.  I will always defend the Ops Teams and the need to have them, that is my bias.  I also believe that it’s perfectly possible for Operations Teams to work in an Agile manner and I have no idea why it takes some Ops Teams three weeks to spin up and hand over a Virtual Machine.  This is an issue with the way that particular team works and uses tools, not the entire Ops movement, but this is a rant for another time!

The Seven Requirements

The article starts by talking about “IT Infrastructure Environments” vs. DevOps.  Reading ahead, I’m fairly sure that Stratoscale mean “not in the cloud” or “physical, on-premise IT Infrastructure” vs a solution such as OpenStack, AWS or Azure, however, the paragraph seems to suggest that “maintenance skilled technicians” are less important in the DevOps world than they are in a more “traditional” IT setting.  I happen to disagree with this.

I’ve seen clients who have designed their own networks in AWS Virtual Private Clouds or on other “cloud” platforms and, because they didn’t have someone who understood networking, it’s been very difficult to link multiple subnets within the same (or indeed different!) VPC’s as they all have the same IP range.  I also find myself asking “so what about patching all these VMs that you’re running your docker containers on?”, to which the answer is often “we just redeploy the docker container when we need to roll out an update”, followed by a discussion about the differences between keeping your platform up to date and keeping your application up to date and installation of  system package update monitoring to improve visibility.

The page then moves on to one of my least favourite arguments in the DevOps world, Generalism vs. Specialism.

One of the main “sells” of generalist vs. specialist is that it means you can get rid of your datacentre team because all of your developers know enough about operations to run the entire platform.  This is never true. 

It is the job of your operations team to provide the same skills in a virtual environment that they would provide in a physical one.  Servers/Instances need to be patched, monitoring needs to be configured by someone who knows what they are doing and when you need that network engineer at 3am to help you discover the source of a UDP traffic outage between two asynchronous components of your architecture, you’ll be incredibly grateful that you kept him on the team.

I wouldn’t ask an Operations Engineer with 15 years experience of working across a wide range of operating systems and platforms to write an application for processing Credit Cards or carrying out real-time trades on the stock exchange any more thanI would ask a developer with the same amount of experience across multiple languages to design the infrastructure for those platforms.

Dev and Ops require different skills sets and, whilst I accept that you do find the rare unicorn that can do all things, DevOps is about collaboration between developers and operations, not about Developers taking over the Operations tasks because they feel Operations are “too slow” or “don’t understand”.

You can achieve “value and credibility, and the business benefits from agility, efficiency and reduced operational expense.” without disposing of your operations team.

Now we get to the “seven ways” (and if you’ve made it this far past the ranting, well done, get yourself an espresso and strap in for the ride!) so I’ll take these in turn:

Enable a Developer-Friendly Environment

An ideal infrastructure environment  allows  developers to focus on application code itself, rather than figuring out how to allocate storage and balance workloads.

Your environments should be friendly for both Dev and Ops.  With the rapid growth of infrastructure as code and tools such as Terraform, Ansible and Chef supporting automated creation of infrastructure as well as the testing frameworks around them, there is absolutely no reason to stop your Operations Teams following the same best practices as your developers and working towards Test Driven Development for Infrastructure.

If your engineers are committing tests, writing code for the infrastructure and the developers have access to both the code and the test results, not only do you get developers and operations teams working together to improve both the app and the configuration, but you get the added bonus of friendly rivalry over who has the healthiest CI status screen!

Virtualize Workloads

A virtualized environment provides more efficient use of resources, while also reducing hardware, licensing and other operational expenses.

I cannot agree with this enough! Before Vagrant came on the scene, I tried to write my own platform for providing self-service environments to developers.  Vagrant was a game-changer and the work that followed it and the rapid increase of features in Openstack, AWS and Azure has led to CI pipelines that can spin up a replica of production, provision it using the same Ansible/Chef code, deploy the application in exactly the same way it would be deployed in Production, run tests (including monitoring checks) against the environment and then report back to the CI platform before throwing the entire environment away has meant that everyone, Dev and Ops alike are able to test their code before it reaches the customer (and they can even do it in parallel!).

Architect for Hyper-Convergence

True hyper-convergence provides seamless integration and management of the compute, storage and network functions across all servers  in the data center.

I still don’t know what “hyper-convergence” really means, (I’m not convinced it’s a word used anywhere other than Enterprise Strategy Meetings and in Marketing Departments – see this post on more of my thoughts about language in IT), however, I do agree that it makes sense to have a platform that provides integrated management across all elements of a datacentre – after all, that’s what the Operations Community has been doing for the past twenty years, it’s not a new thing, it’s just common sense!

Automate Resource Management

With DevOps, resource management is done dynamically, in real time and fully software controlled.

You should be doing this whether you’re “doing DevOps” or not.  If you’re not doing this then you’re about eight years behind the curve and need to catch up.  This applies equally to Waterfall/Traditional IT environments as it does to the new ways of working.

Terraform, Ansible, Chef, Puppet, Jenkins, TeamCity and all the other fantastic software that we have  at our disposal these days to automate our infrastructure mean that there really is no excuse any more to keep hold of those bash-scripts that have “worked” for the past 16 years.

Deploy Invisible Infrastructure

Changing workloads and storage requirements are scalable eliminating the constant monitoring and reaction by IT staff resources.

If your infrastructure is invisible to your development team, they cannot work with your Operations Engineers or DBAs to tune Database indexes, increase performance on the network cards, route traffic efficiently or improve anything other than the code base.

Invisible infrastructure != scalable infrastructure

In order to truly scale an application to cope with the “slashdot effect” or similar, you need people in your team who understand that “just throw RAM and CPU at the problem until it goes away” is merely treating the symptoms, not the cause.

Leverage OpenStack and Other Open-Source Communities

By implementing OpenStack or an OpenStack-compatible solution, a DevOps approach takes advantage of a variety of tools or products available for specific functionality.

I’d go further than this.  Design for “the cloud”. Any cloud.  Your application should be designed to deal with failure, to have nodes in the infrastructure disappear and come back again without warning.  The tooling isn’t really that important.  If your code works in OpenStack, then it should also work in AWS, Azure and any other true cloud provider with a decent set of API’s.

You might have to change some of your terraform to work with another provider, but the rest of your platform and the whole of your architecture should be completely provider independent, which brings us nicely on to the final point…

Avoid Hardware Vendor Lock-In

A true DevOps approach provides an environment where the hardware does not restrict efficient resource management.

I’ve been fighting Vendor Lock-In and trying to evangelise the benefits of designing a vendor-neutral solution for the better part of twenty years, so obviously I’m not going to argue with this statement.

AWS is clearly the world leader in Hosted Cloud Platforms, with Azure now a very close second in regard to feature parity and API support in the major infrastructure automation tooling.

OpenStack is an amazing platform and solution for hosting your own cloud, and with some providers now increasing the public availability of true, feature-rich OpenStack solutions, we have an open-source competitor.

Just remember, if you deploy OpenStack in your own datacentre, who’s going to run the physical servers that it’s deployed to for you? I hope it’s not the Ops team that you made redundant… 😉