This paper from 11 years ago had the exact same finding!!(Finding 10). https://www.usenix.org/system/files/conference/osdi14/osdi14...
Same paper, they're just referencing it:
> In 2014, Yuan et al. found that 92% of catastrophic failures in tested distributed systems were triggered by incorrect handling of nonfatal errors.
Oh, thank you for pointing that out. Appreciate it.