severusdd 7 days ago

The 92 % stat looks really interesting! It’s rarely the spectacular crash that knocks a cluster over. Instead, the “harmless” retry leaks state until everything breaks at 2 a.m on one fateful Friday. Evidently, we should budget more engineering hours for mediocre, silent failures than for outright disasters. That’s where the bodies are buried.

1
smallnix 7 days ago

Or survivorship bias: the major issues, that have been addressed, do not cause problems cause they were addressed. Some of the minor issues that are not addressed randomly do cause major issues.