Fundamentally, this incident stemmed from a failure in our error monitoring. Letterhead typically recovers from small, isolated failures automatically. When it doesn’t, someone is always around to intervene.
In this case, a monitoring failure prevented a small bug—related to email reputation isolation—from registering with an internal system we dub the “dead letter office.” Instead of raising an alarm or implementing recovery procedures, our system kept retrying the process affected by this bug - which is also by design. Unfortunately, nothing intervened to stop it from retrying indefinitely. Eventually one of our services reached a memory-usage threshold that drew our attention.
Consequently, a relatively small issue led to an aggressive throttling of our newsletter transmission pipeline. When a transmission fails for any reason, it retries after a delay. This delay is intentional and designed to prevent congestion—but in this case it was triggered repeatedly, creating a backlog and extending delays.
As of this writing all affected newsletters across all pools have successfully delivered.
We sincerely apologize for the inconvenience.