Gradual Degraded Performance of Project Management Component Between 02:20 PM CEST and 02:42 PM CEST
Incident Report for Memsource
Postmortem

Introduction

We would like to share more details about the events that occurred with Memsource between 14:20 CEST and 14:42 CEST on September 30th, 2021 which led to a gradual outage of the Project Management component and what Memsource engineers are doing to prevent these issues from happening again.

Timeline

14:32 CEST: Gradual deployment of a new version of Project Management is in progress. Memsource engineers are notified that some parts of the user interface are not visible for some users.

14:35 CEST: The new version of the system is identified as the cause of problem. Roll-back to the previous version starts immediately.

14:42 CEST: All servers with the new version of the system are disabled resolving the problem for all users. Roll-back to the previous version continues.

15:16 CEST: All servers are rolled back to the previous version of the system.

Root Cause

An unnoticed programming error combined with an error in the build process led to the deployment of an incorrectly built user interface to some production servers. It made some parts of the user interface invisible to some users.

Actions to Prevent Recurrence

  • Modify the build process to fail if the user interface is not built successfully in order to prevent broken code from getting into the production environment.
  • Extend pre-deployment tasks with a check of the CI pipeline status.
  • Enhance the automated production monitoring by extending user interface checks to identify similar problems faster.

Conclusion

Finally, we want to apologize. We know how critical our services are to your business. Memsource as a whole will do everything to learn from this incident and use it to drive improvements across our services. As with any significant operational issue, Memsource engineers will be working tirelessly over the next coming days and weeks on improving their understanding of the incident and determine how to make changes that improve our services and processes.

Posted Oct 08, 2021 - 10:50 CEST

Resolved
Between 02:20 PM CEST and 02:42 PM CEST users experienced missing features and not working pages due to the degraded performance of the Project Management component. This has been fixed.
Posted Sep 30, 2021 - 14:42 CEST
This incident affected: Memsource (SLA) (Editor for Web).