Resolved -
This incident has been resolved. At no point did we lose any customer data that hit our load balancers. Querying has been stable for hours, and all features that were degraded are functional.
Oct 21, 08:41 PDT
Update -
Honeycomb core functionality is operational. Service maps are up and running and errors have resolved. Querying, triggers and SLOs are continuing to improve and customers may see delays as functionality continues to improve. This is our final update for the night unless the situation degrades. We will continue to monitor the situation.
Oct 20, 17:11 PDT
Update -
Service maps are up and running and errors have resolved. Querying, triggers and SLOs are continuing to improve and customers may see delays as functionality continues to improve. We are continuing to monitor the situation.
Oct 20, 16:08 PDT
Update -
AWS is starting to show signs of recovery. Querying remains partially impacted and may take longer to return results. We are continuing to monitor the situation.
Oct 20, 13:06 PDT
Update -
We’re currently observing that querying is seeing signs of recovery. We are continuing to monitor the situation
Oct 20, 10:39 PDT
Update -
The networking issues in us-east-1 is affecting our query engine - customers may see errors or delays when running queries as a result.
Oct 20, 08:30 PDT
Update -
SLO processing has recovered. Service Maps continues to be degraded but historical data can be queried.
Oct 20, 08:14 PDT
Monitoring -
Due to constrained EC2 instance capacity in the AWS us-east-1 region, we are choosing to allocate the capacity we have to incoming events and telemetry. As such, customers may see over 5 minute delays in our processing of
- SLOs
- Service Maps
We do not expect a degradation of our core ingest -> query flow, and we do not expect triggers to be impacted
Oct 20, 07:16 PDT