Trace

Traces record the processing time and call sequence of requests at various service nodes. Through the perspective of traces, you can analyze the call relationships and performance metrics of all service nodes that a request passes through in a distributed system, from the initiating end to the responding end.

Its value includes but is not limited to:

Quick problem location: When system performance bottlenecks or failures occur, traces can help quickly locate the specific location of the problem, improving fault diagnosis efficiency.
Performance optimization: By analyzing time-consuming nodes and bottlenecks in traces, targeted optimization can be performed to improve overall system performance.
Service governance: Traces help understand the dependencies between services, enabling service governance and optimization to improve system maintainability and scalability.
Monitoring and alerting: Trace-based monitoring can provide real-time understanding of system operation status, promptly detect abnormal situations, and issue alerts to ensure stable system operation.

Trace List

The trace list page displays all collected trace data, helping users deeply understand the call relationships between different methods, error causes, and performance bottlenecks.

The search box at the top allows you to input different tags and tag values to quickly filter target data, such as service name, hostname, Trace ID, etc.

The quick filter box on the left allows you to quickly filter target trace data through multiple filter options. The default filter options for the trace page include duration, status, service name, operation, component, RPC method, HTTP method, HTTP status code, HTTP host, and HTTP path.

The bar chart on the right shows the distribution of total traces within the selected time period.

The data list on the right displays the occurrence time, service name, operation, duration, RPC method, and HTTP status code for each trace within the selected time period by default. The list is sorted by time in descending order by default. Users can click the "Duration" header to sort by duration in ascending or descending order.

Trace Details

Click on a trace in the trace list to open a drawer page on the right showing the trace details.

The top shows basic information about the selected span, including the service it belongs to, operation, start time, duration, and Trace ID. Additionally, clicking "Related Logs" on the right will take you to view the log information for this trace.

The bottom shows the tags and event details of this span. Click on spans in the flame graph or waterfall view to switch the data, and this section will switch to display the information of the selected span.

The middle uses flame graphs, waterfall views, and service topology to show the call relationships and performance data between different methods:

Flame Graph:
- Hovering over a span will display corresponding details, including service name, resource name, start time, and duration.
- Clicking on a span will update the basic information at the top and the tags and event details at the bottom to show information for the selected span.
- The service filter box on the right of the flame graph helps users filter specific service information. Spans of unselected services will be displayed in gray; clicking "Reset to Default" in the top right corner of the service filter box will show span data for all services involved in the current trace.
Waterfall View:
- Hovering over a span will display corresponding details, including service name, resource name, start time, and duration.
- Clicking on a span will update the basic information at the top and the tags and event details at the bottom to show information for the selected span.
Service Topology: Shows the service call relationships in the current trace.