Fylamynt + Splunk On-Call: Automated Incident Response

Streamline application traffic monitoring, and eliminate the need for constant human attention.

Prasen Shelar
April 26, 2021

SREs spend a good chunk of their days identifying performance issues and maintaining service availability. To help them streamline their incident response workflows, tools like Splunk On-Call (formerly VictorOps) ingest application and infrastructure monitoring system alerts and send them to the right person based on various on-call schedules. 

The integration between Splunk On-Call and Fylamynt takes this one step further by automating incident response workflows so that dynamic changes such as unpredictable application traffic are handled instantly, taking the burden off the shoulders of the SRE. 

Your Challenge

Let’s take a look at how we can automate dynamic application traffic management on an AWS DynamoDB table. So, if your application performs reads or writes at a higher rate than what the table was configured to support, DynamoDB begins to throttle those requests. When DynamoDB throttles a read or write, it returns a ProvisionedThroughputExceededException to the caller.

When this alert triggers, you need to switch the existing AWS DynamoDB table to On-Demand capacity mode, which is a more flexible billing option capable of serving thousands of requests per second without read or write capacity planning.

Our Solution

Our integration with Splunk On-Call and our Fylamynt workflows make it simple to automate AWS DynamoDB table capacity changes based on a Splunk On-Call incident trigger. Here’s how you do it in a few easy steps:

- Setup the Splunk On-Call connector in Fylamynt. 

- To start receiving Splunk On-Call incidents, set up your Splunk On-Call instance within the Fylamynt Integrations page.

- Create a remediation workflow using the visual workflow editor.

  • Select “Splunk On Call Alert” action from the list of Trigger Actions.
  • Select “team_name” and “escalation_policy” from the dropdown as the first node of your workflow. 
  • Use AWS Action node with “Service” DynamoDB and “Operation” UpdateTable, provide “TableName” and “BillingMode” as PAY_PER_REQUEST to switch the table billing mode to On-demand capacity.
  • Add “Wait_For_Resource” node as your last Workflow node to wait until the AWS DynamoDB table state is ACTIVE and save the Workflow.

- Once the Splunk On-Call instance has been set up and the workflow created, it’s time to see it in action! As soon as an incident gets generated in Splunk On-Call matching team_name and escalation_policy, the attached workflow DynamoDB-capacity-remediation will be triggered automatically.


Benefit to You

Automated workflows such as this can help you streamline application traffic monitoring, and eliminate the need for constant human attention. Assuming you get 10 alerts each day, and each alert currently takes 10 minutes to resolve, that easily a couple of hours out of your day that is freed up for more critical tasks. Multiple this over weeks and months, and you can realize significant efficiency gains across your team. 


Try out our free trial to discover more ways to streamline your processes!