The API that powers our Dashboard experienced high latency in a few endpoints which caused corresponding pages to have high response times.
Why did it happened?
Over the past few weeks, we've had an influx of notifications from the agents that have slowed down the service that processes these notifications. The Dashboard is dependent on this service for communication to the agents which is why it slowed down.
How we fixed it?
Since then, we have pushed a few fixes to decouple the dashboard page loads from that notification service so that the dashboard does not rely on it and therefore loads quicker. We also pushed an optimization to the notification processing where we offload the heavy processing to a separate, more scalable and load balanced service so that notifications can be processed more efficiently.
What are long term plans to make sure it doesn't happen again?
We've added additional monitoring and alerting to critical components that will notify us when performance is degraded. We're continuing to improve the communication between our agent and our cloud services, including optimizing queries to our database and queuing heavy workloads to be processed timely and efficiently. We're improving our incident response process, so we are quicker and more responsive during these incidents.
Posted Oct 04, 2024 - 12:50 PDT
Resolved
After continued monitoring shows a return to normal operations for the Accounts related pages, this incident is resolved.
We are aware of slowness on the Agents page, and our team will continue to release fixes to address this in the coming days.
If you continue to receive error banners, please purge the Cache and Cookies for the Dashboard site, and relaunch the Browser and log in again. We've heard from a number of users that this resolved the error.
A Post Mortem will be posted tomorrow for this incident.
Please reply to this email if you are still experiencing slow responses on the Dashboard.
Posted Oct 03, 2024 - 17:03 PDT
Update
The CyberQP team worked overnight to implement some integral changes to the Dashboard loading process. These changes, coupled with additional changes that we will be implementing over the next 24 hours, should return the Dashboard to normal functionality.
A postmortem will be posted with the final resolution of this incident.
If you are still seeing slowness when loading the Accounts lists for your customers, taking longer than 30 seconds to load, please advise us via a support ticket (reply to this email to automatically create the ticket).
Posted Oct 02, 2024 - 14:24 PDT
Update
After a weekend of good results, Tuesday has resurfaced this challenge. Our team is working on this as quickly as possible and we'll continue to update everyone as we have more to share.
We have seen early morning (6am to noon EST) seeing much better response times, however as the day progresses, the slowness returns.
Posted Oct 01, 2024 - 16:18 PDT
Update
We have made a significant changes to resolve this. We will be making additional changes and continue to monitor this over the weekend.
Additional updates will be made Monday/Tuesday as we continue to improve the responsiveness of the Dashboard.
Posted Sep 27, 2024 - 14:19 PDT
Update
We have seen a return of the Dashboard slowness this afternoon and are continuing to monitor and work on a permanent resolution. We will continue to update the page as we have more details.
Posted Sep 26, 2024 - 15:43 PDT
Monitoring
Our team has investigated the causes and will be monitoring today.
Posted Sep 26, 2024 - 06:45 PDT
Investigating
Our team is seeing intermittent slowness in the web dashboard (admin.getquickpass.com). We are actively investigating and will provide further updates as more information is made available.
Posted Sep 25, 2024 - 07:31 PDT
This incident affected: EU (Web Dashboard) and US (Web Dashboard).