Fylamynt Ensures Restaurant Deliveries Make It Home Seamlessly

All of these services need to run continuously with no downtime. If the application isn’t working when a customer is hungry, they’ll switch to a competitor in seconds.

Scott Lasica
November 22, 2021

A leading national restaurant delivery company provides an application that allows end users to order food from thousands of restaurants. The order is sent to the restaurant, prepared and then picked up and delivered to the customer. As you can imagine, there are many steps in this process that need to be sure to be completed. The restaurant needs to acknowledge the order and provide a time estimate for when the food will be ready. The driver needs to be found from the pool, checking timing, routes, distance and many other complicated factors. All of these services need to run continuously with no downtime. If the application isn’t working when a customer is hungry, they’ll switch to a competitor in seconds. 


Challenge

Running the complex services in the cloud requires handling scale, as well as performing well at all times. No large organization believes that they can maintain 24/7 perfect service availability. Instead they plan for incidents to occur, and put procedures and processes in place ahead of time to ensure that preventative measures can be taken where possible, and where not to have the fastest response possible. These processes and procedures are called Runbooks, and go along with Incident Management and Incident Response policies. The people responsible for this work are called Site Reliability Engineers (SREs), and they’re the ones that get “paged” in the middle of the night when things break. 


When an incident does occur, SREs pull out the correct runbook which details the steps required to remediate the issue. There are times that level 1 support can handle some simpler incidents, allowing them to escalate to an SRE only when necessary. When that level 1 engineer (less experienced )or sleepy SRE woken in the middle of the night (not at their best) are scrambling to fix the issue as quickly as they can mistakes can be made. Possibilities include:

  • Using a runbook for another environment
  • Using a runbook intended for another issue
  • Taking a very long time to determine the actual source of the issue
  • Missing the timing on the steps required when things need to “warm up”
  • Many others…


Why Fylamynt

The restaurant delivery company was handling their cloud incident response with a combination of alerting tools, incident management tools, runbooks in a wiki and custom code to automate tasks when possible. The current system worked, but presented many challenges around consistency of response, speedy resolution and transparency into the system and remediation.


Primary reasons Fylamynt was chosen:

  • Intuitive no-code GUI to build incident runbooks (workflows)
  • Support for over 40 of the most used SRE tools
  • Dashboard that shows all runbook executions and the results
  • Ability to automate some or all of the runbook steps
  • Ability to put a “human in the loop” to decide critical steps
  • Automation to reduce MTTR
  • Ability to ensure runbooks execute consistently every time
  • Ability to send slack updates automatically as well as start a zoom for response


In addition, they received the bonus of being able to schedule runbook executions for things like cost management (reduce instance size if below a threshold). 


Future Vision

The restaurant delivery company plans to continue building out runbooks where they can, including complex conditional statements and other data points to automate as much as possible and ensure a smooth app experience for all their users. As they continue to scale their user base, they can allow the SREs to focus their expertise on performance, scale, efficiency and optimization of production services instead of firefighting.


Fylamynt has created the world’s first enterprise ready low code platform for building, running and analyzing SRE cloud workflows. With Fylamynt an SRE can automate the parts of the runbook that are the most time consuming, allowing them to make decisions where their expertise is needed.

Try Fylamynt for free -->