Wednesday, 29. October 2014

Shared Responsibility Examples: The Re:Boot

In last week’s post, we explored the shared responsibility model for security in the AWS cloud. Over the next couple of weeks, we’re going to dive into specific examples that show how the model works for those of us working in this environment.

A bug wearing 3d glasses

Re:Boot Near the end of September, there was a critical bug discovered for the Xen hypervisor. This bug–XSA–108–was first under embargo, which means there was a very limited audience of people who were made aware of the issue.

An embargo gives critical deployments (such as AWS) the opportunity to deploy a patch before the vulnerability is widely known and attempts to exploit it increase.

The deployment of the patch required a reboot of the hypervisor, which means all of the instances running on that particular node would also need to be rebooted.

Whose Responsibility? With a bug that affects only the hypervisor, we can consult the model for security and see that the virtualization layer is entirely the responsibility of AWS.

However, it’s not as simple as that. Yes, AWS will need to deploy and verify the patch, but because it required a hypervisor reboot, there is a very real impact on you, the AWS user.

Communications Once aware of the issue, AWS started to notify EC2 users that some of their instances may need to be rebooted. If an instance in your account was going to be affected, you would have received an email with some details.

That is in addition to a blog post, notification in the EC2 Management Console, AWS Trust Advisor, and via the EC2 API.

TL:DR, AWS nailed the communications side of this issue. It was hard not to be aware that an instance of yours was affected.

Your Responsibility With some instances being rebooted, you had responsibilities as well. If your architecture is “cloud-native,” the auto-scaling, multi-AZ/region designs for high availability means that you only had to sit back and ensure that your design worked as intended… but of course, you already knew it would since you test it regularly, right?

For a practical example of how this can work in a cloud-native design, read Bruce Wong & Christos Kalantzis’ account of how Netflix handled the issue.

With more traditional workloads, you would have had to step in and ensure that your applications continued to be available during the reboot. The easiest method would be to create a snapshot and re-instantiate the instance in a new availability zone that had already completed the maintenance.

More To Come This is a great example of how an issue in AWS’ area of responsibility can have an impact on your responsibilities as well. The key to the shared responsibility model is communication.

With this issue, AWS demonstrated how well they communicate. They made the information readily available in a number of formats, including the API.

Next week, we’ll look at an issue that highlights another aspect of the model in action.

Planning your calendar for #reInvent? Remember to add SEC313, “Update Security Operations For The Cloud.” In this talk, I’ll be showing how you can improve your security practice by leveraging features of the AWS Cloud.