Handle Errors Once
Today at $work, an interesting quirk of humans and error handling came up.
The code I was asked to review included something like
try:
some_operation()
except Exception as e:
log.exception("Oh noes!")
raise SomeOtherException() from e
Which on the surface makes sense. We try to do a thing, if it fails, log the failure and report the error up to the caller. Great, everyone is satisfied, we get a good log right when a thing happens, and our caller gets the exception too!
Lets give it a try: ./errors_once.py
Oh Noes!
Traceback (most recent call last):
File "errors_once.py", line 22, in some_larger_method
some_op()
~~~~~~~^^
File "errors_once.py", line 17, in some_op
raise ValueError("Something went wrong!")
ValueError: Something went wrong!
Was unable to do the thing! universe exploded!
Traceback (most recent call last):
File "errors_once.py", line 22, in some_larger_method
some_op()
~~~~~~~^^
File "errors_once.py", line 17, in some_op
raise ValueError("Something went wrong!")
ValueError: Something went wrong!
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "errors_once.py", line 30, in main
some_larger_method()
~~~~~~~~~~~~~~~~~~^^
File "errors_once.py", line 25, in some_larger_method
raise UniverseExploded("Could not do the thing!") from e
UniverseExploded: Could not do the thing!
Huh. Well that seems to work, but its a little bit difficult to read. Of course
stack traces are always weird. But why do we have Something Went Wrong twice?
Did it error twice?
Nope. Here's the code:
import logging
logger = logging.getLogger("testing")
logger.setLevel(logging.DEBUG)
class UniverseExploded(ValueError): ...
def some_op():
raise ValueError("Something went wrong!")
def some_larger_method():
try:
some_op()
except Exception as e:
logger.exception("Oh Noes!")
raise UniverseExploded("Could not do the thing!") from e
def main():
try:
some_larger_method()
except UniverseExploded:
logger.exception("Was unable to do the thing! universe exploded!")
main()
So the exception only happens once. What Gives?
Handle Errors Once
We broke the rule, we handled the error more than once. In this case, logging counts as handling. We chose to use the exception as a decision driver for outputting data.
It sure seems like a good idea: something went wrong, we should log it! And that's absolutely the right thing to do. However, we didn't just do that. We also threw the error up to our caller. Who, following that same good idea, saw something went wrong, and logged it.
At $work this is a pattern I see quite a lot, and it results in exception
soup1 that upsets monitoring and means that we are very accustomed to huge
logs, with massive stack traces that are entirely unhelpful2.
The root cause here is entirely in the idea that something that went wrong has to be logged now and has to have an error condition propagate upwards, where it again has to be logged and propagated, and so on. The end result is a chain of stack traces, each getting longer as the error propagates up the call stack. No new information is added by these other traces, in fact, they increase noise and easily become red herrings.
Thus we hit what I try to keep as a best-practice:
Log, or raise, don't do both
Or better worded, as Dave Cheney Put It 3:
Handle Errors Once
In the simple case above, the main function should log it before continuing with
whatever processing needs happening, or nothing should log it, and the
exception hits the top-level, and everything gets torn down (i.e., Let It Fail)
Of course, adding information with exception chaining (raise ... from ...) is
very much a good idea and nice pattern here. With chaining you add contextual
information and can still make the problem your callers problem instead.
When doesn't this apply?
Never.
Okay kidding. This doesn't apply when its useful to throw around actual
information. But then don't dump the stack unless you really have to. Rather
add context. Like every piece of programming advice, this comes with Use Your Brain.
Nothing applies everywhere; but some things apply in more places than
people think.
What this does not mean
This isn't meant to say that contextual logging around exceptional cases is bad, just that repeated logging around that is bad. If I have to work out how many of the stack traces I'm seeing are the same occurrence of the same problem, something has gone wrong.
Best had with a side of syntax salad↩
Some of those massive stack traces come from huge reflection based Java frameworks, but that's a whole other can of worms I don't want to open here. In Short: Don't Do That.↩
Yeah, it's a GoLang blog, but go's error handling makes you start to really think about stuff, and one of the things I took from my time with Go is that errors really need not be exceptional in many cases. Sometimes its useful to just punt them to your caller and not care, but always add context, so that your caller can make decisions.↩