AWS Outage, how are you handling the fallout?

Today’s outage at Amazon Web Services' us-east-1 cloud region is impacting customers globally which results in the loss of revenue. These are not isolated incidents and can happen at any time and on any public cloud service provider. How do you deal with these outages? Do you maintain application-level high availability, do you wait for services to return to normal, or do you take manual action by means of failing over or migrating your workloads to a healthy region in order to retain uptime?

Johann Stande
December 7, 2021

Today’s outage at Amazon Web Services' us-east-1 cloud region is impacting customers globally which results in the loss of revenue. These are not isolated incidents and can happen at any time and on any public cloud service provider. How do you deal with these outages? Do you maintain application-level high availability, do you wait for services to return to normal, or do you take manual action by means of failing over or migrating your workloads to a healthy region in order to retain uptime?


AWS has responded to the outages citing the impairment of several network devices in the us-east-1 region, which is also affecting their monitoring and incident response tooling, delaying their ability to provide proper service status updates. Currently, there is also no ETA for a full recovery.

As a cloud workflow automation company built for DevOps and SREs, you can easily build automated workflows that take the necessary actions to respond to incidents 24X7.


As an example, Fylamynt has strong integration with AWS Health APIs, so that you can remediate problems automatically with an example workflow available here.


However, as AWS stated in their response, their own monitoring and incident response tools are also affected and unable to provide status updates in a timely fashion.

This means that early detection of AWS service issues was not easily identified and why it is not a good idea to rely solely on native service provider tools for monitoring and incident response. Using 3de party APM tools is essential to monitor your applications and services to mitigate similar situations in the future.

Again Fylamynt has you covered with easy-to-use pre-defined integrations to construct time- 

saving workflows including everyday tasks to enterprise infrastructure.


These connectors provide the ability to automatically execute workflows in Fylamynt which are triggered from predefined APM tool alerts or incidents. In this case, providing early notification of application outage as well as low MTTA and MTTR.

Don’t get caught off guard and sign up today and try Fylamynt for free.

Fylamynt has created the world’s first enterprise ready low code platform for building, running and analyzing SRE cloud workflows. With Fylamynt an SRE can automate the parts of the runbook that are the most time consuming, allowing them to make decisions where their expertise is needed.


Ready to get started?

With Fylamynt you can Build, Run and Analyze cloud workflows securely for any cloud with little to no code.