Honeycomb queries and triggers were impaired from 19:00 to 20:28 UTC on June 13, 2023 due to an outage in our upstream service provider. The following is an overview of what happened during this incident.
During this incident, certain queries (described below) failed, usually after a long delay. This applied to queries on https://ui.honeycomb.io/, Triggers, and queries sent through the Query Data API. Data ingestion remained unaffected, and no user data was lost during this time.
Our data storage has two tiers, “hot” and “cold.” Queries against cold storage are performed using AWS Lambda (more detail in this article). Degradation of the Lambda service meant that any query against cold data failed. Queries against hot data were unaffected.
We transition data from hot to cold storage within 24 hours after we receive it. However, the exact time of this transition depends on the rate at which any given customer sends us data. For that reason, it is not possible to give a specific age cutoff at which queries began to fail during this incident.
Most triggers tend to be against recent data, and our systems continued to evaluate and alert on these triggers as usual. Any trigger that queried against cold data failed during this incident. SLOs continued to function for the duration of this incident, and burn alerts were sent as usual.