It would also be nice to list some "best practices" on how to handle non-fatal errors. I would be definitely interested to know of any sources.
One of the nice things about "errors as values" is that it is generally easier to shim in an error rather than shim in an exception. Not that it's impossible to do the latter, but it's just generally easier because you can have that error serving as a value in your test code.
I have a lot of Go testing shims that look like:
type UserGetter interface {
GetUser(userID string) (User, error)
}
type TestUsers struct {
Users map[string]User
Error error
}
func (t TestUsers) GetUser(userID string) (User, error) {
if t.Error != nil {
return User{}, t.Error
}
user, have := t.Users[userID]
if !have {
return User{}, ErrUserNotFound
}
return user, nil
}
This allows easily testing errors upon retrieving users and ensuring the correct thing happens. I'm not a 100% maniacal "get 100% test coverage in everything" kind of guy, but on the flip side, if your test coverage only lights up the "happy path", your testing is not good enough and as you scale up the probability that your system is going to do something very wrong when an error occurs very rapidly approaches 1.It's more complicated when you have something like a byte stream where you want to simulate a failure at arbitrary points in the stream, but similar techniques can get you as close as you like, depending on how close that is.
From there, in terms of "how do you handle non-fatal errors", there really isn't a snap rule to give. Quite often you just propagate because there isn't anything else to do. Sometimes you retry some bounded number of times, maybe with backoff. Sometimes you log things and move on. Sometimes you have a fallback you may try. It just depends on your needs. I write a lot of network code, and I find that once my systems mature it's actually the case that rather a lot of the errors in the system get some sort of handling beyond "just propagate it up", but it's hard for me to ever guess in advance what they will be. It's a lot easy to mentally design all the happy paths than it is to figure out all the ways the perverse external universe will screw them up and how we can try to mitigate the issues.
The same way you handle fatal errors, by specifying the exceptional circumstances and how to handle them (retry, alternative actions, or signaling to another handler up the call/request tree). Something’s correct output may not be our thing’s correct input.
I think the best practice is to handle them with equal attention as the happy path. Error handling is usually afterthought from my experience.
What is the system state when it does error?
What is the best possible recovery from each error state?
What can the user/caller expect for an error?
One I see a lot is not being careful to use the correct error type / status code.
E.g. if you're in python and raise a value error when an API is rate limited, someone down stream from you is going to have a bad time.