Google Technical Report dapper-2010-1, April 2010 https://research.google/pubs/pub36356/
"We built Dapper to provide Google’s developers with more information about the behavior of complex distributed systems."
"two fundamental requirements for Dapper: ubiquitous deployment, and continuous monitoring."
- ubiquity (omnipresence, Allgegenwart)
"the usefulness of a tracing infrastructure can be severly impacted if even small parts of the system are not being monitored."
- continuity (Kontinuität)
monitoring should always be turned on, because it is often the case that unusual or otherwise noteworthy system behavior is difficult or impossible to reproduce."
- Low overhead:
the tracing system should have negligible performance impact on running services.
- Application-level transparency:
programmers should not need to be aware of the tracing system.
- Scalability:
it needs to handle the size of Google’s services and clusters for at least the next few years.
- Enabling fast reaction:
tracing data to be available for analysis quickly after it is generated: ideally within a minute
Making the system scalable and reducing performance overhead was facilitated by the use of adaptive sampling
we have found that a sample of just one out of thousands of requests provides sufficient information for many common uses of the tracing data
was achieved by restricting Dapper’s core tracing instrumentation to a small corpus of ubiquitous threading, control flow, and RPC library code
We tend to think of a Dapper trace as a tree of nested RPCs. However, our core data model is not restricted to our particular RPC framework; we also trace activities such as SMTP sessions in Gmail, HTTP requests from the outside world, and outbound queries to SQL servers.
In a Dapper trace tree, the tree nodes are basic units of work which we refer to as spans. The edges indicate a casual relationship between a span and its parent span.
a span is also a simple log of timestamped records which encode the span’s start and end time, any RPC timing data, and zero or more application-specific annotations
Dapper records a human-readable span name for each span, as well as a span id and parent id in order to reconstruct the causal relationships between the individual spans in a single distributed trace. Spans created without a parent id are known as root spans. All spans associated with a specific trace also share a common trace id. All of these ids are probabilistically unique 64-bit integers.
Span start and end times as well as any RPC timing information are recorded by Dapper’s RPC library instrumentation.
every RPC span contains annotations from both the client and server processes, making two-host spans the most common ones.