Skip to content

Commit 1957ca8

Browse files
feat!: metrics refactor (#3262)
## Description Depends on n0-computer/net-tools#20 Until now, we were using a superglobal static `iroh_metrics::core::Core` struct to collect metrics into. This allowed us to use macros to track metrics from anywhere in the codebase. However, this also made it impossible to collect metrics *per endpoint*, which is what you want usually as soon as you have more than one endpoint in your app. This PR builds on n0-computer/iroh-metrics#15, n0-computer/iroh-metrics#22, and n0-computer/iroh-metrics#23. It removes the global metrics collection from all crates in the iroh repository. Instead, we now create and pass metrics collector structs to all places where we need to collect metrics. This PR disables the `static_core` feature from iroh-metrics, which means the macros for superglobal metrics collection are not available anymore. This is good, because otherwise we could easily miss metrics not tracked onto the proper metrics collector. This PR also updates iroh-dns-server and iroh-relay to use manual metrics collection. While this means that we have to pass our metrics structs to more places, it also makes metrics collection more visible, and we can now also split the metrics structs further easily if we want to separate concerns more. This PR should not change anything apart from metrics collection. Most places are straightforward conversions from the macros to methods on the metrics collectors. At a few places, logic was changed slightly to move metrics collection a layer up to save a few clones. ## Breaking Changes * All metrics structs (`iroh::metrics::{MagicsockMetrics, PortmapMetrics, NetReportMetrics}`) now implement `MetricsGroup` from the new version `0.34` of `iroh-metrics` and no longer implement traits from `[email protected]`. * Metrics are no longer registered onto the static superglobal `Core`. `iroh` does not use `static_core` feature of `iroh-metrics`. Metrics are now exposed from the subsystems that track them, see e.g. `Endpoint::metrics`. Several methods now take a `Metrics` argument. You can always pass `Default::default` if you don't want to unify metrics tracking with other sections. #### `iroh` * `iroh::metrics::{MagicsockMetrics, NetReportMetrics, PortmapMetrics}` all are now marked `non_exhaustive`, and implement `iroh_metrics::MetricsGroup` from `[email protected]` and no longer implement `iroh_metrics::Metric` from `[email protected]`. They also no longer implement `Clone` (put them into an `Arc` for cloning instead). * `iroh::net_report::Client::new` now takes `iroh::net_report::metrics::Metrics` as forth argument #### `iroh-dns-server` * `iroh_dns_server::server::Server::spawn` now takes `Metrics` as third argument * `iroh_dns_server::ZoneStore::persistent` now takes `Metrics` as third argument * `iroh_dns_server::ZoneStore::in_memory` now takes `Metrics` as third argument * `iroh_dns_server::ZoneStore::new` now takes `Metrics` as third argument * `iroh_dns_server::state::AppState` now has a public `metrics: Metrics` field * `iroh_dns_server::dns::DnsHandler::new` now takes `Metrics` as third argument * function `iroh_dns_server::metrics::init_metrics` is removed #### `iroh-relay` * `iroh_relay::metrics::{StunMetrics, Metrics}` all are now marked `non_exhaustive`, and implement `iroh_metrics::MetricsGroup` from `[email protected]` and no longer implement `iroh_metrics::Metric` from `[email protected]`. They also no longer implement `Clone` (put them into an `Arc` for cloning instead). ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented. - [x] List all breaking changes in the above "Breaking Changes" section. - [x] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. - [x] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - n0-computer/iroh-gossip#58 - [x] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - n0-computer/iroh-blobs#85 - [x] [`iroh-docs`](https://github.com/n0-computer/iroh-docs) - n0-computer/iroh-docs#41 --------- Co-authored-by: dignifiedquire <[email protected]>
1 parent 839bfaa commit 1957ca8

34 files changed

+847
-773
lines changed

Cargo.lock

+88-163
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

iroh-dns-server/Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ governor = "0.8"
2828
hickory-server = { version = "0.25.1", features = ["https-ring"] }
2929
http = "1.0.0"
3030
humantime-serde = "1.1.1"
31-
iroh-metrics = { version = "0.32.0", features = ["metrics", "service"] }
31+
iroh-metrics = { version = "0.34", features = ["service"] }
3232
lru = "0.12.3"
3333
n0-future = "0.1.2"
3434
pkarr = { version = "3.7", features = ["relays", "dht"], default-features = false }

iroh-dns-server/benches/write.rs

+10-3
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,22 @@
1+
use std::sync::Arc;
2+
13
use anyhow::Result;
24
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
35
use iroh::{discovery::pkarr::PkarrRelayClient, node_info::NodeInfo, SecretKey};
4-
use iroh_dns_server::{config::Config, server::Server, ZoneStore};
6+
use iroh_dns_server::{config::Config, metrics::Metrics, server::Server, ZoneStore};
57
use rand_chacha::rand_core::SeedableRng;
68
use tokio::runtime::Runtime;
79

810
const LOCALHOST_PKARR: &str = "http://localhost:8080/pkarr";
911

1012
async fn start_dns_server(config: Config) -> Result<Server> {
11-
let store = ZoneStore::persistent(Config::signed_packet_store_path()?, Default::default())?;
12-
Server::spawn(config, store).await
13+
let metrics = Arc::new(Metrics::default());
14+
let store = ZoneStore::persistent(
15+
Config::signed_packet_store_path()?,
16+
Default::default(),
17+
metrics.clone(),
18+
)?;
19+
Server::spawn(config, store, metrics).await
1320
}
1421

1522
fn benchmark_dns_server(c: &mut Criterion) {

iroh-dns-server/src/dns.rs

+15-10
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ use hickory_server::{
2626
server::{Request, RequestHandler, ResponseHandler, ResponseInfo},
2727
store::in_memory::InMemoryAuthority,
2828
};
29-
use iroh_metrics::inc;
3029
use serde::{Deserialize, Serialize};
3130
use tokio::{
3231
net::{TcpListener, UdpSocket},
@@ -122,12 +121,13 @@ impl DnsServer {
122121
pub struct DnsHandler {
123122
#[debug("Catalog")]
124123
catalog: Arc<Catalog>,
124+
metrics: Arc<Metrics>,
125125
}
126126

127127
impl DnsHandler {
128128
/// Create a DNS server given some settings, a connection to the DB for DID-by-username lookups
129129
/// and the server DID to serve under `_did.<origin>`.
130-
pub fn new(zone_store: ZoneStore, config: &DnsConfig) -> Result<Self> {
130+
pub fn new(zone_store: ZoneStore, config: &DnsConfig, metrics: Arc<Metrics>) -> Result<Self> {
131131
let origins = config
132132
.origins
133133
.iter()
@@ -149,6 +149,7 @@ impl DnsHandler {
149149

150150
Ok(Self {
151151
catalog: Arc::new(catalog),
152+
metrics,
152153
})
153154
}
154155

@@ -168,23 +169,27 @@ impl RequestHandler for DnsHandler {
168169
request: &Request,
169170
response_handle: R,
170171
) -> ResponseInfo {
171-
inc!(Metrics, dns_requests);
172+
self.metrics.dns_requests.inc();
172173
match request.protocol() {
173-
Protocol::Udp => inc!(Metrics, dns_requests_udp),
174-
Protocol::Https => inc!(Metrics, dns_requests_https),
174+
Protocol::Udp => {
175+
self.metrics.dns_requests_udp.inc();
176+
}
177+
Protocol::Https => {
178+
self.metrics.dns_requests_https.inc();
179+
}
175180
_ => {}
176181
}
177182
debug!(protocol=%request.protocol(), queries=?request.queries(), "incoming DNS request");
178183

179184
let res = self.catalog.handle_request(request, response_handle).await;
180185
match &res.response_code() {
181186
ResponseCode::NoError => match res.answer_count() {
182-
0 => inc!(Metrics, dns_lookup_notfound),
183-
_ => inc!(Metrics, dns_lookup_success),
187+
0 => self.metrics.dns_lookup_notfound.inc(),
188+
_ => self.metrics.dns_lookup_success.inc(),
184189
},
185-
ResponseCode::NXDomain => inc!(Metrics, dns_lookup_notfound),
186-
_ => inc!(Metrics, dns_lookup_error),
187-
}
190+
ResponseCode::NXDomain => self.metrics.dns_lookup_notfound.inc(),
191+
_ => self.metrics.dns_lookup_error.inc(),
192+
};
188193
res
189194
}
190195
}

iroh-dns-server/src/http.rs

+16-10
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,14 @@ use std::{
77

88
use anyhow::{bail, Context, Result};
99
use axum::{
10-
extract::{ConnectInfo, Request},
10+
extract::{ConnectInfo, Request, State},
1111
handler::Handler,
1212
http::Method,
1313
middleware::{self, Next},
1414
response::IntoResponse,
1515
routing::get,
1616
Router,
1717
};
18-
use iroh_metrics::{inc, inc_by};
1918
use serde::{Deserialize, Serialize};
2019
use tokio::{net::TcpListener, task::JoinSet};
2120
use tower_http::{
@@ -31,7 +30,7 @@ mod rate_limiting;
3130
mod tls;
3231

3332
pub use self::{rate_limiting::RateLimitConfig, tls::CertMode};
34-
use crate::{config::Config, metrics::Metrics, state::AppState};
33+
use crate::{config::Config, state::AppState};
3534

3635
/// Config for the HTTP server
3736
#[derive(Debug, Serialize, Deserialize, Clone)]
@@ -227,13 +226,13 @@ pub(crate) fn create_app(state: AppState, rate_limit_config: &RateLimitConfig) -
227226
)
228227
.route("/healthcheck", get(|| async { "OK" }))
229228
.route("/", get(|| async { "Hi!" }))
230-
.with_state(state);
229+
.with_state(state.clone());
231230

232231
// configure app
233232
router
234233
.layer(cors)
235234
.layer(trace)
236-
.route_layer(middleware::from_fn(metrics_middleware))
235+
.route_layer(middleware::from_fn_with_state(state, metrics_middleware))
237236
}
238237

239238
/// Record request metrics.
@@ -244,17 +243,24 @@ pub(crate) fn create_app(state: AppState, rate_limit_config: &RateLimitConfig) -
244243
//
245244
// See also
246245
// https://github.com/tokio-rs/axum/blob/main/examples/prometheus-metrics/src/main.rs#L114
247-
async fn metrics_middleware(req: Request, next: Next) -> impl IntoResponse {
246+
async fn metrics_middleware(
247+
State(state): State<AppState>,
248+
req: Request,
249+
next: Next,
250+
) -> impl IntoResponse {
248251
let start = Instant::now();
249252
let response = next.run(req).await;
250253
let latency = start.elapsed().as_millis();
251254
let status = response.status();
252-
inc_by!(Metrics, http_requests_duration_ms, latency as u64);
253-
inc!(Metrics, http_requests);
255+
state
256+
.metrics
257+
.http_requests_duration_ms
258+
.inc_by(latency as u64);
259+
state.metrics.http_requests.inc();
254260
if status.is_success() {
255-
inc!(Metrics, http_requests_success);
261+
state.metrics.http_requests_success.inc();
256262
} else {
257-
inc!(Metrics, http_requests_error);
263+
state.metrics.http_requests_error.inc();
258264
}
259265
response
260266
}

iroh-dns-server/src/lib.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ mod tests {
194194
max_batch_time: Duration::from_millis(100),
195195
..Default::default()
196196
};
197-
let store = ZoneStore::in_memory(options)?;
197+
let store = ZoneStore::in_memory(options, Default::default())?;
198198

199199
// create a signed packet
200200
let signed_packet = random_signed_packet()?;

iroh-dns-server/src/main.rs

+1-4
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@ use std::path::PathBuf;
22

33
use anyhow::Result;
44
use clap::Parser;
5-
use iroh_dns_server::{
6-
config::Config, metrics::init_metrics, server::run_with_config_until_ctrl_c,
7-
};
5+
use iroh_dns_server::{config::Config, server::run_with_config_until_ctrl_c};
86
use tracing::debug;
97

108
#[derive(Parser, Debug)]
@@ -27,6 +25,5 @@ async fn main() -> Result<()> {
2725
Config::default()
2826
};
2927

30-
init_metrics();
3128
run_with_config_until_ctrl_c(config).await
3229
}

iroh-dns-server/src/metrics.rs

+19-42
Original file line numberDiff line numberDiff line change
@@ -1,64 +1,41 @@
11
//! Metrics support for the server
22
3-
use iroh_metrics::core::{Core, Counter, Metric};
4-
use struct_iterable::Iterable;
3+
use iroh_metrics::{Counter, MetricsGroup};
54

65
/// Metrics for iroh-dns-server
7-
#[derive(Debug, Clone, Iterable)]
8-
#[allow(missing_docs)]
6+
#[derive(Debug, Default, MetricsGroup)]
7+
#[metrics(name = "dns_server")]
98
pub struct Metrics {
9+
/// Number of pkarr relay puts that updated the state
1010
pub pkarr_publish_update: Counter,
11+
/// Number of pkarr relay puts that did not update the state
1112
pub pkarr_publish_noop: Counter,
13+
/// DNS requests (total)
1214
pub dns_requests: Counter,
15+
/// DNS requests via UDP
1316
pub dns_requests_udp: Counter,
17+
/// DNS requests via HTTPS (DoH)
1418
pub dns_requests_https: Counter,
19+
/// DNS lookup responses with at least one answer
1520
pub dns_lookup_success: Counter,
21+
/// DNS lookup responses with no answers
1622
pub dns_lookup_notfound: Counter,
23+
/// DNS lookup responses which failed
1724
pub dns_lookup_error: Counter,
25+
/// Number of HTTP requests
1826
pub http_requests: Counter,
27+
/// Number of HTTP requests with a 2xx status code
1928
pub http_requests_success: Counter,
29+
/// Number of HTTP requests with a non-2xx status code
2030
pub http_requests_error: Counter,
31+
/// Total duration of all HTTP requests
2132
pub http_requests_duration_ms: Counter,
33+
/// Signed packets inserted into the store
2234
pub store_packets_inserted: Counter,
35+
/// Signed packets removed from the store
2336
pub store_packets_removed: Counter,
37+
/// Number of updates to existing packets
2438
pub store_packets_updated: Counter,
39+
/// Number of expired packets
2540
pub store_packets_expired: Counter,
2641
}
27-
28-
impl Default for Metrics {
29-
fn default() -> Self {
30-
Self {
31-
pkarr_publish_update: Counter::new("Number of pkarr relay puts that updated the state"),
32-
pkarr_publish_noop: Counter::new(
33-
"Number of pkarr relay puts that did not update the state",
34-
),
35-
dns_requests: Counter::new("DNS requests (total)"),
36-
dns_requests_udp: Counter::new("DNS requests via UDP"),
37-
dns_requests_https: Counter::new("DNS requests via HTTPS (DoH)"),
38-
dns_lookup_success: Counter::new("DNS lookup responses with at least one answer"),
39-
dns_lookup_notfound: Counter::new("DNS lookup responses with no answers"),
40-
dns_lookup_error: Counter::new("DNS lookup responses which failed"),
41-
http_requests: Counter::new("Number of HTTP requests"),
42-
http_requests_success: Counter::new("Number of HTTP requests with a 2xx status code"),
43-
http_requests_error: Counter::new("Number of HTTP requests with a non-2xx status code"),
44-
http_requests_duration_ms: Counter::new("Total duration of all HTTP requests"),
45-
store_packets_inserted: Counter::new("Signed packets inserted into the store"),
46-
store_packets_removed: Counter::new("Signed packets removed from the store"),
47-
store_packets_updated: Counter::new("Number of updates to existing packets"),
48-
store_packets_expired: Counter::new("Number of expired packets"),
49-
}
50-
}
51-
}
52-
53-
impl Metric for Metrics {
54-
fn name() -> &'static str {
55-
"dns_server"
56-
}
57-
}
58-
59-
/// Init the metrics collection core.
60-
pub fn init_metrics() {
61-
Core::init(|reg, metrics| {
62-
metrics.insert(Metrics::new(reg));
63-
});
64-
}

iroh-dns-server/src/server.rs

+19-8
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,34 @@
11
//! The main server which combines the DNS and HTTP(S) servers.
22
3+
use std::sync::Arc;
4+
35
use anyhow::Result;
4-
use iroh_metrics::metrics::start_metrics_server;
6+
use iroh_metrics::service::start_metrics_server;
57
use tracing::info;
68

79
use crate::{
810
config::Config,
911
dns::{DnsHandler, DnsServer},
1012
http::HttpServer,
13+
metrics::Metrics,
1114
state::AppState,
1215
store::ZoneStore,
1316
};
1417

1518
/// Spawn the server and run until the `Ctrl-C` signal is received, then shutdown.
1619
pub async fn run_with_config_until_ctrl_c(config: Config) -> Result<()> {
20+
let metrics = Arc::new(Metrics::default());
1721
let zone_store_options = config.zone_store.clone().unwrap_or_default();
1822
let mut store = ZoneStore::persistent(
1923
Config::signed_packet_store_path()?,
2024
zone_store_options.into(),
25+
metrics.clone(),
2126
)?;
2227
if let Some(bootstrap) = config.mainline_enabled() {
2328
info!("mainline fallback enabled");
2429
store = store.with_mainline_fallback(bootstrap);
2530
};
26-
let server = Server::spawn(config, store).await?;
31+
let server = Server::spawn(config, store, metrics).await?;
2732
tokio::signal::ctrl_c().await?;
2833
info!("shutdown");
2934
server.shutdown().await?;
@@ -44,15 +49,21 @@ impl Server {
4449
/// * A DNS server task
4550
/// * A HTTP server task, if `config.http` is not empty
4651
/// * A HTTPS server task, if `config.https` is not empty
47-
pub async fn spawn(config: Config, store: ZoneStore) -> Result<Self> {
48-
let dns_handler = DnsHandler::new(store.clone(), &config.dns)?;
52+
pub async fn spawn(config: Config, store: ZoneStore, metrics: Arc<Metrics>) -> Result<Self> {
53+
let dns_handler = DnsHandler::new(store.clone(), &config.dns, metrics.clone())?;
4954

50-
let state = AppState { store, dns_handler };
55+
let state = AppState {
56+
store,
57+
dns_handler,
58+
metrics: metrics.clone(),
59+
};
5160

5261
let metrics_addr = config.metrics_addr();
5362
let metrics_task = tokio::task::spawn(async move {
5463
if let Some(addr) = metrics_addr {
55-
start_metrics_server(addr).await?;
64+
let mut registry = iroh_metrics::Registry::default();
65+
registry.register(metrics);
66+
start_metrics_server(addr, Arc::new(registry)).await?;
5667
}
5768
Ok(())
5869
});
@@ -122,12 +133,12 @@ impl Server {
122133
config.https = None;
123134
config.metrics = Some(MetricsConfig::disabled());
124135

125-
let mut store = ZoneStore::in_memory(options.unwrap_or_default())?;
136+
let mut store = ZoneStore::in_memory(options.unwrap_or_default(), Default::default())?;
126137
if let Some(bootstrap) = mainline {
127138
info!("mainline fallback enabled");
128139
store = store.with_mainline_fallback(bootstrap);
129140
}
130-
let server = Self::spawn(config, store).await?;
141+
let server = Self::spawn(config, store, Default::default()).await?;
131142
let dns_addr = server.dns_server.local_addr();
132143
let http_addr = server.http_server.http_addr().expect("http is set");
133144
let http_url = format!("http://{http_addr}").parse()?;

iroh-dns-server/src/state.rs

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
//! Shared state and store for the iroh-dns-server
22
3-
use crate::{dns::DnsHandler, store::ZoneStore};
3+
use std::sync::Arc;
4+
5+
use crate::{dns::DnsHandler, metrics::Metrics, store::ZoneStore};
46

57
/// The shared app state.
68
#[derive(Clone)]
@@ -9,4 +11,6 @@ pub struct AppState {
911
pub store: ZoneStore,
1012
/// Handler for DNS requests
1113
pub dns_handler: DnsHandler,
14+
/// Metrics collector.
15+
pub metrics: Arc<Metrics>,
1216
}

0 commit comments

Comments
 (0)