Kang YiKai

Wanderlog

· 3 min read

Aside from asking senior developers or reading source code, logging is the most vital tool for understanding how a service behaves. However, in a vast microservice architecture, logs are massive and scattered everywhere. How can you use them effectively?

The Four Dimensions of Logging Platforms

When faced with complex platforms, you can categorize them into four types based on the core problems they solve:

DimensionQuestionSolution
CI/CD (Build and Deploy)Focus on process: Did the code compile and deploy? Is the automation smooth?Jenkins, Concourse
Quality and SecurityFocus on results: Did the security scans and acceptance tests pass?SonarQube, Gradle Report
ObservabilityFocus on health: Is the system “alive” and healthy right now based on visual charts?Grafana
Tracing and TroubleshootingFocus on records: What did the specific scene look like when the error occurred?Kibana, Splunk, Dynatrace, CloudWatch

Platform Characteristics

When you encounter an unfamiliar logging platform, do not try to understand the definition of every parameter right away. Your primary mission is to learn how to search and view results rather than trying to grasp the specific meaning of every single log entry.

For example, you should master KQL or filters in Kibana, natural language search in Splunk, retrieval methods in the AWS Portal, or the distributed tracing view in Dynatrace. The tool is simply a means to an end. Your real goal is to find the right information.

Meaningful Parameters

Logs record a mountain of information, but not all of it is valuable.

Can you tell the difference between json.trace_id, json.traceId, and json.trace.trace_id?

In reality, only the parameters you actually understand are meaningful to you.

Give priority to intuitive and readable values. Focus on timestamps, service names, stack traces, and request bodies that have clear business logic. If you encounter parameters you do not understand, try to learn them through comparison and correlation.

For instance, you might find a pod name in your logs that you saw in another tool. Even if the keys are different, you might find that an ID value from Platform A appears in Platform B. You may not know exactly what that ID does at first, but by filtering for it, you can see a series of related logs that reveal its purpose.

Meaningful Log Messages

The message part of a log is the most intuitive element because it often provides direct clues for troubleshooting.

Sometimes you might understand every word in a log message but still have no idea what the intent behind it is. In these cases, try to use unique text from the log to search the source code. See what conditions trigger that specific log to understand what is actually happening behind the scenes.

The Ultimate Goal: Telling a Correct Story

Scattered logs must eventually be integrated by either human logic or AI tools. We use these fragments of information to speculate on the system status with one goal in mind: to reconstruct the complete story of the system’s operation.

The ability to tell a logical and consistent story is crucial. You can try reading “good stories” from successful case studies as a reference. By comparing them, you will have the chance to discover exactly where “bad stories” went wrong.

Boosting Efficiency

I estimate that being able to flexibly query and use logs can drastically improve team collaboration.

I remember when I was unfamiliar with logging systems. I always needed senior colleagues to guide me through the exploration for three to six hours at a time. Once I learned to explore independently, I could gather all relevant information, build hypotheses, and record business issues on my own. It is perfectly fine to construct a “wrong story” during this preparation stage.

When you are fully prepared, you can schedule a brief review with your colleagues. You walk them through your hypotheses and your story while they simply provide feedback and corrections. A task that once took half a day can now be resolved in one or two hours, which significantly boosts efficiency.

Thank you for reading! Your support is appreciated.

If you enjoyed this, consider buying me a coffee. ☕️