Tracing: structured logging, but better in every way

bahmanm · 1 year ago

Tracing: structured logging, but better in every way

@[email protected] · 1 year ago

Some thoughts from my side (coming from another domain - more embedded):

Whether you use a message string or a named bool does not change anything. It’s still logging.
It’s of course nice to just trace everything and filter / search afterwards, but in embedded for example your machine may just crash if you try that. For that log levels are the traditional way to filter before a log is written.
I don’t get how timestamping / ordering is necessarily worse for logging. Maybe it’s just the framework that is used?
You sure can have hierarchical information in log frameworks

In my opinion log levels sure make sense, but it may vary wildly depending on what you’re doing. We run our software in different environments:

Development machines / VMs
Development boards
Production ECUs

And it’s run by different sets of people:

Devs
Integrators
Customers
…

Depending on the combination of where / who you get different requirements.

I get that Logging is hard and often you get messages with a wrong log level or you’re missing a message at a crucial point etc. But tracing is not better in every way - they should complement each other.

bahmanm · 1 year ago

Thanks for sharing your insights.

Thinking out loud here…

In my experience with traditional logging and distributed systems, timestamps and request IDs do store the information required to partially reconstruct a timeline:

In the case of a linear (single branch) timeline you can always “query” by a request ID and order by the timestamps and that’s pretty much what tracing will do too.
Things, however, get complicated when you’ve a timeline w/ multiple branches.
For example, consider the following relatively simple diagram.
Reconstructing the causality and join/fork relations between the executions nodes is almost impossible using traditional logs whereas a tracing solution will turn this into a nice visual w/ all the spans and sub-spans.

That said, logs do shine when things go wrong; when you start your investigation by using a stacktrace in the logs as a clue. That (stacktrace) is something that I’m not sure a tracing solution will be able to provide.

they should complement each other

Yes! You nailed it 💯

Logs are indispensable for troubleshooting (and potentially nothing else) while tracers are great for, well, tracing the data/request throughout the system and analyse the mutations.