In our hectic cloud-based world, devops (the mixing of infrastructure operations with software development) has become the standard way we build and run high-scale sites from IaaS to SaaS. There are lessons to be learned from how we got here, especially because devops isn’t very security friendly.
Here’s how we got to this sorry state, from the perspective of someone who started working on cloud infrastructure in 1998. I’ve run both dev and ops functions in multiple cloud environments and launched two early cloud computing services. I also ran the Web & Internet Engineering program for 5 years for the University of California, where I designed and taught courses to teach coders the basics of infrastructure operations and to teach infrastructure people about software architecture.
When I studied for my BS in Computer Information Systems, we had these nicely structured roles where software architects designed code that was implemented by developers, run through QA, and handed off to operations (aka sysadmins). If the code wasn’t stable, operations rejected it. They had the same level of influence as a product manager who could also reject code that didn’t meet functional requirements.
But in the real world, there is a hierarchy of geek cred. As the cloud was forming in the days of ASP, MSP, and dotcom, if you were a CCIE or maybe an Oracle DBA, you could name your price for an operational role. You ruled the operational roost. On the other hand, if you were a Java programmer, you could also name your price and you were the top dog of software developers. A skeptic would be tempted to think that Cisco’s CLI-only interface and Java’s needless complexity were both designed to fuel employment for highly paid geeks, who would in turn recommend buying insanely overpriced Cisco network gear and giant Sun servers to run inefficient Java code. But soon, the big salaries meant there were enough CCIEs to go around, and Java developers far outnumbered operations experts to the extent that it was hard to sort Java developers from baristas.
As the Internet drove demand for very short software release cycles, it meant coders would take shortcuts in their coding and operations people would reject the code. After all, it’s the operations people who wake up at 2 AM when the site is down. It’s the developers who rolled in at 10 AM to write code between chewing slices of company-supplied pizza. Both dev and ops felt pressure from executives desperate to launch new features quickly.
I can’t tell you how many meetings I sat through where operations people yelled at developers for writing inefficient, latency intolerant, generally sloppy code that would take down servers if it was deployed at scale. We began to insist that developers write code on the same platform they would be deployed on. We made them use WAN simulators so they could see how slow the code ran over the Internet versus Ethernet. Soon, the surplus of Java developers came to the conclusion that it was simpler to fix their own bugs on the fly than it was to write tight code.
It was a seemingly elegant solution because it met the business need of short software release cycles. It removed the pesky operations release decision that was formerly in place to filter out flaky code. It also held developers accountable for carrying a pager and waking up if their code was flaky. We started talking about graceful degradation and fixing on the fly.
There’s one little problem. I think it’s genetic. It’s politically incorrect. Software developers generally make poor operations people. Sysadmin (ops) people generally make poor developers. Developers build stuff and like shiny new things. Sysadmins like stable systems that stay up because they don’t change all the time. Sure, there’s a lot in common between dev and ops: a loathing of GUI, a love for t-shirts, curious hygiene habits, Red Bull, etc., but they are wired differently. (I will not say which of these traits I share with dev and ops, but only mention that I studied computer science and computer information systems…)
You could make the argument that developers are good enough at ops (or vice versa) that the system still works. It does when site availability is your foremost goal. It’s easy to fix a little coding mistake that lowered system performance for an hour. It’s simply not possible to repair the damage done by a similar little coding mistake that allowed your customer database to be stolen. You can’t fix security on the fly. You have to do it right the first time and you have to follow tight operational procedures if you want to remain compliant with regulations.
The problem with dev and ops is that both of them will take security shortcuts if necessary to meet their goals. Dev will get the new feature out the door on time before anything else. Ops will prioritize keeping the site up and running. That’s why most IT departments today still have a security function separate and apart from development and operations. It’s also why cloud providers who blindly subscribe to the devops philosophy will likely have less secure environments. Well-run, highly secure environments will, by nature, have slower release cycles than fly-by-the-seat-of-your-pants, fix-it-as-you-go environments. And that’s OK.
The good thing about the cloud is that you can have fewer ops people because your cloud provider will have its own team of infrastructure operations. But it doesn’t mean you should put your developers in charge of operations. By all means, your developers and your operations lead should be sitting in the same room and have drinks together regularly, but they should not be the same person and they should have equal power in the organization. It’s fine if they both don’t like the security architect, but he also needs an equal seat at the table.
To do otherwise is to put your company data at risk. All the high-end security software in the world won’t save you from a combination of poorly written code and sloppy operations. It’s time to get software development, security, and operations right the first time.
(Thanks to Ted Dziuba, co-founder of Milo.com, and his awesome blog entry titled, “Devops is a Poorly Executed Scam” that prompted me to think this way.)
[Ed. note: Trend Micro would like to know what you think about this. We enthusiastically invite your comments and we will read every one of them. For very detailed information about Trend Micro and Security Built for Enterprise Virtualization and Cloud Environments, please visit our website: Enterprise Virtualization – Cloud Virtualization Environment – Trend Micro USA
This is so true – and I would like to add as someone who has spent more than 15 years in this industry, the source of most project failures can be traced back to the dichotomy exposed by this article.
@John – you make some great points about QA cycles being short in the cloud because you can just roll back an unstable change that degrades but doesn’t take down the site. But what if the change opens a hole that isn’t readily identified? Once the data is out the door, you can’t roll that back like you do with stability problems. That’s the whole gist of the post!
@Wes – ROFL with your new name for BizQualiSecDevOps.
Also good points. But if you look at what a typical product manager is going to ask for, it’s shipping dates and features. They won’t likely be asking for number of bugs or security incidents. Sure, DevOps will track those, but when the deadline is looming the business guys are going to focus on shipping more than security. If you have perfect synergy in your company, great, this won’t be a problem. In the real world, the odds of making that “we deploy now” decision before you’re ready go up when there’s a business requirement, just like they do everywhere else. Thanks for your thoughts.
@Dave If someone’s metric is “Did you ship code on time” then I think we could all agree that they’re Doing It Wrong. If you’re monitoring everything important, what precludes an org from monitoring things like number of security bugs caught live, number of security incidents, etc ?
DevOps doesn’t exclude security any more than it excludes quality assurance. I would argue that the focus on communication instead of throwing things over walls and praying is the exact sort of thing required in order to build security from the start(and testability and operational concerns).
It’s not called SecDevOps because it could also be called BizQualiSecDevOps
@James – You’re right, I did over-focus on dev vs ops. It’s also part of the point that devops is more focused on getting it out the door and keeping it up and running than it is on keeping it secure. Not that keeping it secure isn’t a priority, it’s just not the *absolute highest* priority for the dev role or the ops role.
It’s psychology – when your metric for whether you’re doing your job well is “Did you ship code on time?” you end up subconsciously prioritizing shipping code over security. Sure, you *know* that security is critical and you try to build it in, but the guy calling you every 5 minutes is asking, “is it ready yet?” not, “is it secure yet?”
After all, isn’t that why most commercially shipping code, from Windows to OSX, ships with security holes? It’s because the need to ship outweighed the need to be 100% secure.
I’m actually a huge fan of devops – that’s why, when I worked at a cloud provider, at the same time I taught for 5 years at the Univ of California making developers learn about ops, and vice versa!
To the extent devops increases communication between security architects, developers, and sysadmins, it’s going to improve the situation. But there’s a reason we didn’t call it “SecDevOps.”
Dave
Fundamentally though the link between speed of deployment and security isn’t linear to me. Security is provided by implementing measured and cost-effective controls appropriate for your risk appetite. Speed of deployment is part of your risk appetite but so is choice of security architecture, AAA, encryption, etc, etc. There is perhaps some inherent risk in that deploying fast can result in more vulnerabilities but deploying slower does not magically mitigate this risk. If you don’t care about security then it won’t matter how fast or slow you deploy.
Dave,
I’ll try not to parrot to much of what I said in various tweets. Suffice to say, I personally don’t correlate rapidly shipped code with insecure code. Maybe shoddy code but not inherently insecure.
I also want to make sure we don’t equate traditional desktop software with web applications or infrastructure-level code (API endpoints and such).
The time to market for desktop software is extended considerably based on extended QA processes required. Should there always be code smell, code audits and regression testing for desktop software (especially known security issues)? Absolutely but it’s simply not as easy as it is for “web” software.
The qa cycle for most web applications is measured in minutes as opposed to weeks in most cases. And this isn’t a case of no testing. Look at Wealthfront (a financial services company that practices continuous deployment). They have tests measured in the few thousands and those finish within minutes. The test vectors are simply more narrow – does it work right on these browsers, do the unit tests pass, do the integration tests pass? All of it is automated when new code is checked in.
It would seem the case being presented here is that code, because it was immediately deployed (and automatically rolled back should it fail any of the startup tests) is somehow less secure than the code that took 2 weeks to QA and roll out? I would say no. People make mistakes. People (who we’ve determined make mistakes) also make mistakes trying to detect other mistakes. Computers are notoriously good at automating things. Testing is one of those areas where they do it pretty well (provided we input the right parameters). One of the nice things about agile development is that when you find a bug, you write a test to catch it and it shouldn’t go to production ever again. Rinse repeat. Meanwhile Bobby is into his 11th hour of the day and decides to check off the last few tests so he can go home.
Those processes are what “devops” is partly about. You know how I know that I didn’t misconfigure apache or my system mailer and open us up to risk? Because its configuration isn’t a manual process. I don’t have to look at my notes. I did it once, codified it as a puppet manifest or chef cookbook and moved on to the next thing. Upgrades are less risky because I can easily rollback thus I’m actually applying vendor security fixes in a more timely fashion.
The point you bring up have come up before and been addressed as well. Since you suggested I do so, here’s a response to a similar article from back in January – http://goo.gl/4StSn
It’s interesting that you’ve chosen to make the leap you have to limit the scope of DevOps to the use of Dev concepts in Ops and about deployment (though I am conscious you ARE a security vendor – *cough* snake oil *cough*
). DevOps is much broader than that as a concept and a set of ideas and tools.
Let me run another scenario past you – one I’ve seen regularly during my last twenty years as a security guy:
“Dev produces an application and bats it back and forth a few times with Ops. Eventually everyone in Ops is happy and they go to deploy. Until Security says ‘Did you code review/pen test/risk assess this app?’ Everyone in dev and ops looks at each other and sighs ‘Bloody Security – always a pain in the a*** – always saying No.’ The cycle started again – this time with back and forth between Dev, Ops and Security…”
If you’ve worked in any big enterprise that’s going to be a pretty familiar scenario. Non-functional requirements are always looked at last. Security requirements are even usually behind operational ones in this world.
One of the key tenets of DevOps is communication, collaboration and cooperation. The model is designed to try to not just re-use some Dev ideas in Ops. That’s the superficial view of DevOps. DevOps tries to encourage people to inject the right ideas at the right time – for example inject operational reality into the SDLC. And if you can treat operational non-functional requirements as first class citizens then why not treat security requirements in the same way? If DevOps allows Security people to inject themselves earlier into the process and ensure apps are developed securely and with the organisation’s security architecture in mind then that’s a win.
To me DevOps is very security friendly – it’s about building collaboration, shared workflow, iterating on requirements early and often and changing the culture in organisations to one of cooperative effort towards building and running apps rather than conflict-driven process. All things that deliver value to the business, save money, enhance availability and in my view offer the potential to enhance security.