We would like to share more details about the events that occurred with Memsource between 12:10 and 3:10 PM CEST on 29th May, 2020 which led to the significant slow down of pre-translations (most of them stuck for some time) and then errors returned by pre-translation with Globalese 3 MT engine and what Memsource engineers are doing to prevent these sorts of issues from happening again.
12:10 PM CEST - We received alerts about a higher number of unprocessed asynchronous operations and began investigating the cause.
12:37 PM CEST - The cause is identified - many calls to a slow Globalese 3 MT server. We began implementing measures to unblock other waiting operations.
12:55 PM CEST - We received first user complaints about some slower operations.
1:17 PM CEST - We temporarily disabled the Globalese 3 engine type in our MT connector, unblocking all waiting operations. They processed very quickly but translation with Globalese 3 started to return errors.
1:31 PM CEST - We partially enabled Globalese 3; cca 50% translations were successful and 50% returned errors.
3:10 PM CEST - After reconfiguration of the MT connector service, Globalese 3 is enabled.
Many pre-translations with the Globalese 3 MT engine were blocking the capacity of the MT integration service.
As a reaction to the problems:
Finally, we want to apologize. We know how critical our services are to your business. Memsource as a whole will do everything to learn from this incident and use it to drive improvements across our services. As with any significant operational issue, Memsource engineers will be working tirelessly over the next coming days and weeks on improving their understanding of the incident and determine how to make changes that improve our services and processes.