Short Degraded Performance of Project Management, Editors, API and File Processing Components Between 05:10 and 05:18 PM CEST
Incident Report for Phrase (formerly Memsource)
Postmortem

Introduction

We would like to share more details about the events that occurred with Memsource between 17:10 CEST and 17:18 CEST on September 26th, 2021 which led to a partial performance degradation of the Project Management, Editors, API and File Processing services and what Memsource engineers are doing to prevent these issues from happening again.

Timeline

17:11 CEST: Automated monitoring reports long request response times. Memsource engineers start working on identifying the problem.

17:14 CEST: High database load is identified as the cause of long response times.

17:18 CEST: Database load drops back to a normal level.

17:47 CEST: Identified database queries that caused the high database load.

18:23 CEST: Problematic requests are temporarily blocked to prevent further incidents.

Root Cause

The number of slow queries on the database server increased in a short period of time. That caused a significant drop in DB performance and resulted in general Memsource unavailability for a few minutes. Requests that led to the long-running queries were temporarily blocked.

Actions to Prevent Recurrence

  • Enhance monitoring of the database to trigger earlier warnings.
  • Optimize long-running database queries to improve their performance.

Conclusion

Finally, we want to apologize. We know how critical our services are to your business. Memsource as a whole will do everything to learn from this incident and use it to drive improvements across our services. As with any significant operational issue, Memsource engineers will be working tirelessly over the next coming days and weeks on improving their understanding of the incident and determine how to make changes that improve our services and processes.

Posted Oct 07, 2021 - 13:02 CEST

Resolved
A significant drop in the database performance resulted in general Memsource unavailability for a few minutes.
Posted Sep 26, 2021 - 17:00 CEST