On May 1st, we deployed a standard maintenance release that included a change to unify where we pull data from in our backend services.
On May 2nd, during peak usage hours, we received reports of certain pages loading too slowly. At the same time, we noticed abnormal load on one of our services causing it to take a long time to respond to requests.
We began troubleshooting and determined it was related to the unifying code we deployed and decided to revert the change.
We're taking steps to mitigate this in the future by adding additional monitoring and alerting as well as adding steps in our design and test processes to better understand and forecast performance impacts.