Honeycomb Ingest Outage
Incident Report for Honeycomb
Postmortem

We deployed a code change to our storage management system which overwhelmed our production MySQL database. This impacted our data ingestion service, which was unable to receive events for 22 minutes from 19:08 UTC to 19:30 UTC. The data ingestion service began recovering at 19:30 UTC, but continued to experience errors at a decreasing rate until 19:57 UTC.

Stabilization steps were initiated at 19:18 UTC with a deployment rollback, and a fix was merged at 20:03 and deployed at 20:19 UTC. We continued to monitor until 21:00 UTC when the incident was declared as fully resolved.

Posted May 13, 2020 - 15:35 PDT

Resolved
This incident has been resolved.
Posted May 13, 2020 - 14:00 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 13, 2020 - 13:33 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted May 13, 2020 - 12:58 PDT
Investigating
We are currently investigating this issue.
Posted May 13, 2020 - 12:41 PDT
This incident affected: api.honeycomb.io - US1 Event Ingest, ui.honeycomb.io - US1 Querying, and ui.honeycomb.io - US1 App Interface.