From Ad-hoc Scripting to Workflow as Code: The Evolution of Runbooks

Workflow as code simply means that we’re using code to orchestrate and execute a workflow, very likely in a distributed environment.

Scott Lasica
November 10, 2021

Unfortunately the word workflow has been used for many years to represent some very specific things in the business world (the most common being BPMN — Business Process Modeling Notation). However, at a general level it’s simply describing a set of steps done in a specific order to achieve the desired end result.

Workflow as code simply means that we’re using code to orchestrate and execute a workflow, very likely in a distributed environment. In the site reliability engineering (SRE) or cloud engineering space, these workflows tend to deal with things like cost savings and incident resolution (and are often called runbooks).

In the early days of SRE (when it was still called DevOps), the ability to chain together specified actions with code was a much more daunting task. Let’s take what seems like a simple example: a database instance is out of storage. Assuming the engineer had the appropriate monitoring in place, they would be alerted. At that point they need to verify that it’s not a false alarm, spin up a new larger instance, copy the data over into the new one, verify the data integrity, redirect all the services using the old db to the new one, verify services are operating normally then destroy the old db instance. Engineers realized situations like this will happen often enough that they can automate some of these steps, writing code between them to at least do things like verification steps automatically.

Moving forward to modern day, there are tools that can help with many of these steps. As an example, you could have PagerDuty collect data from CloudWatch and generate an incident, then using code modify the database instance storage capacity. With things like AWS RDS, the steps of create, copy, destroy aren’t needed as they can resize on the fly. Still, the code you write to connect these services together will still be custom, need to be maintained and could contain bugs. Using another tool to build the workflows, connecting the services together for you and handling the orchestrated execution once put into production is ideal.

Fylamynt has created the world’s first enterprise ready low code platform for building, running and analyzing SRE cloud workflows. With Fylamynt an SRE can automate the parts of the runbook that are the most time consuming, allowing them to make decisions where their expertise is needed. With over 40 prebuilt integrations and more than 60 sample workflows to cover common SRE workflow needs, getting up and running takes no time at all. You can create incident runbooks, cost savings runbooks and many more.

Try Fylamynt for free ->