Projects and Jobs not Displayed for Linguist Users
Incident Report for Memsource
Postmortem

Introduction

We would like to share more details about the events that occurred with Memsource between 10:42 AM and 12:07 PM CET on the 28th of January, 2020 which led to the disappearance of jobs and projects for linguist users and what Memsource engineers are doing to prevent these sorts of issues from happening again.

Timeline

10:42 AM CET: Deployment of a new version of Project Management started.

11:13 AM CET: While the deployment is still in progress, the new version is deployed on several servers. The engineering team notices that projects and jobs are not being displayed for some linguists.

11:32 AM CET: The engineering team identified that the new version is causing the problem and decided to revert to the previous version.

12:07 PM CET: The previous version is deployed and linguists are able to access the jobs.

Root cause

The new version of Project Management requires an update of existing project and job data to make it compatible with newly introduced database changes. The update process took some time and DB queries in the new version were not robust enough to support the partially updated database. The combination of these two issues lead to the data not being fully visible in the new version of the application.

Actions to Prevent Recurrence

As a reaction to the problems:

  • The problematic DB query has been fixed and made more robust to support a partially updated database.
  • Our testing environment will be configured to allow testing of a partially updated database.
  • Analysis of deployment procedures will be performed to aid rollout to a subset of servers

Conclusion

Finally, we want to apologize. We know how critical our services are to your business. Memsource as a whole will do everything to learn from this incident and use it to drive improvements across our services. As with any significant operational issue, Memsource engineers will be working tirelessly over the next coming days and weeks on improving their understanding of the incident and determine how to make changes that improve our services and processes.

Posted Jan 30, 2020 - 17:15 CET

Resolved
This incident has been resolved.
Posted Jan 28, 2020 - 12:41 CET
Monitoring
The issue has been identified and fixed. Right now, we continue to monitor the performance. A postmortem with more details will be added later.
Posted Jan 28, 2020 - 12:14 CET
Investigating
Our engineering team is investigating an issue affecting the display of projects and jobs for linguist users.
Posted Jan 28, 2020 - 11:39 CET
This incident affected: Memsource (SLA) (Project Management).