Mixnode stress testing (#6575)
* Squashing the mix stress testing branch (#6575) reduced chain watcher per block log severity update network monitors contract semver to 1.0.0 fix build issues fix mixnet client dropping initial packet on egress reconnection adjusted logs for network monitor agent changed default testing interval to 2h refresh NM contract information explicit return type for batch submission for mixnet listener task to get scheduled before beginning connectivity test make sure to always use canonical ip for network monitor noise keys feat: NMv3: make agents decide egress port (#6746) add config v12->v13 config migration for nym nodes fix formatting in wallet types simplified client config creation remove other swagger redirect removed swagger redirect on /swagger/ route log version info on startup add workflows, contract address, and dockerfile bugfix: use correct endpoints when setting up orchestrator (#6733) clippy adjust DEFAULT_MIN_STRESS_TESTED_NODES ratio expose route with new performance metrics fixes and additional docs use stress testing scores stub for usage of stress testing scores stub traits added new fields to nym-api config controlling usage of stress test data guard against duplicate packets prevent usage of chain_authorisation_check_max_attempts with value of 0 make sure duplicate results cant be inserted into the db submit test results from orchestrator on an interval docs and fixes nym-api side of handling result submission stubs for submitting results NM orchestrator verifying nym-api result submission permissions NM orchestrator to update announced key on startup allow NM orchestrator to announce its identity key to the contract stubs within nym-api for accepting NMv3 results added additional metrics docs bugfixes + making sure to only assign mixnode testruns fixed node refresher to only retrieve mixnodes and add additional metrics topology metrics defined basic prometheus metrics authorised endpoint for returning prometheus data create initial stub for prometheus metrics post rebasing fixes adjusted routes missing implementation for storage getters a lot of new stubs and db accessors stubs for results endpoints update utoipa tags for agent rountes shared auth between metrics and results moved stale results eviction into the interval.tick branch refactor and comments create background process to evict stale data include sphinx packet delay as part of the stats fix mock construction add median to the calculated latency distribution remove unused imports cleanup performing testrun and submitting the results assigning testruns to requesting agents basic stub for http server for the NMv3 orchestrator chore: rename existing 'NetworkMonitorAgent' to 'NodeStressTester' make sure to use canonical ips within the noise config fixed contract tests cargo fmt additional comments and unit tests contract and nym-node support of NM agents being run on the same host basic unit tests refactoring make agents retrieve mix port assignment from the orchestrator provide sensible defaults to CLI arguments stub the initial structure for the agent chore: remove redundant import missed tick behaviour removed redundant mutex removed redundant try_get_client reuse existing constant for default nymnode port add node refresher for periodic scraping of bonded nym-node details - NodeRefresher periodically queries the mixnet contract for all bonded nodes and probes each node's HTTP API for host information, sphinx keys, noise keys, and key rotation IDs - Extract NymNodeApiClientRetriever into nym-node-requests with port probing, identity verification, and host information signature checking - Add clone_query_client on NyxdClient so the refresher can hold its own query client without locking the signing client - Batch upsert for nym_node rows (single transaction instead of per-row) - Reuse the new helpers in nym-api's node_describe_cache ensure assignment of testrun begins an IMMEDIATE tx construction of the orchestrator struct initial set of cli args make sure to not assign testable nodes too often very initial database structure and cli fixed construction of RoutableNetworkMonitors remove redundant constructor for NoiseNode forbid 0-nonsense config values add type safety for test route construction moved lioness and arrayref to workspace deps fixed dockerfile build always use canonical addresses in RoutableNetworkMonitors fixed old contract formatting issues removed redundant into() call network monitor agent fixes additional logs config unit tests more docs standalone stress testing invocation further refactoring and changes refactor testing loop and return valid test result upon completion initial sending/receiving test loop generating reusable sphinx headers additional structure for receiving ingress packets initial scaffolding for NMv3 agent added validation of x25519 noise key removed unstable call to 'is_multiple_of' remove calls to from_octets as they're unavailable in pre 1.91 additional docs/comments propagating noise information about NM for mixnet routing pass full socket address of the agent into the contract storage feat: store noise keys alongside ip addresses within the contract removed redundant comment ensure NM packets can only go to NM PR review comments added additional docs allow NM to replay packets + fix replay prometheus metrics propagate information about nm agent to connection handler updated nym-node config migration feat: introduced nym-node websocket subscription for keeping updated list of NM agents allow admin to also revoke monitor agents remove agents upon orchestrator removal fixed schema generation and regenerated the contract schema removed rustc restriction on contracts-common added client methods for interacting with the contract added unit tests for contract methods implemented logic of the network monitors contract create initial structure for network monitors contract start mix stress testing topic branch * make nym-node default to the new blockstream rpc/ws node cluster * reduced mixnet-client log severity * set network monitors contract address for mainnet
This commit is contained in:
committed by
GitHub
parent
e5cd9fd69e
commit
46c67440bb
@@ -0,0 +1,56 @@
|
||||
[package]
|
||||
name = "nym-network-monitor-agent"
|
||||
description = "Agent used for stress testing Nym mixnodes"
|
||||
version = "1.0.2"
|
||||
authors.workspace = true
|
||||
edition.workspace = true
|
||||
license.workspace = true
|
||||
repository.workspace = true
|
||||
homepage.workspace = true
|
||||
documentation.workspace = true
|
||||
rust-version.workspace = true
|
||||
readme.workspace = true
|
||||
publish = false
|
||||
|
||||
[dependencies]
|
||||
anyhow = { workspace = true }
|
||||
clap = { workspace = true, features = ["cargo", "env"] }
|
||||
futures = { workspace = true }
|
||||
humantime = { workspace = true }
|
||||
rand = { workspace = true }
|
||||
nym-sphinx-types = { workspace = true }
|
||||
nym-sphinx-params = { workspace = true }
|
||||
nym-sphinx-framing = { workspace = true }
|
||||
nym-sphinx-addressing = { workspace = true }
|
||||
nym-noise = { workspace = true }
|
||||
time = { workspace = true }
|
||||
tokio = { workspace = true, features = ["macros", "sync", "rt-multi-thread"] }
|
||||
tokio-util = { workspace = true }
|
||||
tracing = { workspace = true }
|
||||
url = { workspace = true }
|
||||
zeroize = { workspace = true }
|
||||
|
||||
# methods to recreate lioness
|
||||
# we don't care about particular versions - just pull whatever is used by sphinx
|
||||
lioness = { workspace = true }
|
||||
arrayref = { workspace = true }
|
||||
sha2 = { workspace = true }
|
||||
hkdf = { workspace = true }
|
||||
x25519-dalek = { workspace = true }
|
||||
|
||||
|
||||
nym-bin-common = { workspace = true, features = [
|
||||
"basic_tracing",
|
||||
"output_format",
|
||||
] }
|
||||
nym-crypto = { workspace = true, features = ["asymmetric", "rand", "hashing"] }
|
||||
nym-pemstore = { workspace = true }
|
||||
nym-task = { workspace = true }
|
||||
|
||||
nym-network-monitor-orchestrator-requests = { path = "../nym-network-monitor-orchestrator-requests", features = ["client"] }
|
||||
|
||||
[dev-dependencies]
|
||||
nym-test-utils = { workspace = true }
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
@@ -0,0 +1,22 @@
|
||||
# this will only work with VPN, otherwise remove the harbor part
|
||||
FROM harbor.nymte.ch/dockerhub/rust:latest AS builder
|
||||
|
||||
RUN apt update && apt install -yy libdbus-1-dev pkg-config libclang-dev
|
||||
|
||||
COPY ./ /usr/src/nym
|
||||
WORKDIR /usr/src/nym
|
||||
|
||||
RUN cargo build --bin nym-network-monitor-agent --release
|
||||
|
||||
FROM harbor.nymte.ch/dockerhub/ubuntu:24.04
|
||||
|
||||
RUN apt-get update && apt-get install -y ca-certificates
|
||||
|
||||
WORKDIR /nym
|
||||
|
||||
COPY --from=builder /usr/src/nym/target/release/nym-network-monitor-agent ./
|
||||
COPY --from=builder /usr/src/nym/nym-network-monitor-v3/nym-network-monitor-agent/entrypoint.sh ./
|
||||
RUN chmod +x /nym/entrypoint.sh
|
||||
|
||||
ENV SLEEP_TIME=5
|
||||
ENTRYPOINT [ "/nym/entrypoint.sh" ]
|
||||
@@ -0,0 +1,3 @@
|
||||
# Network Monitor Agent
|
||||
|
||||
An agent to run nym node stress tests and report results back to the Network Monitor orchestrator.
|
||||
+49
@@ -0,0 +1,49 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Build and push Network Monitor Agent container to harbor.nymte.ch
|
||||
|
||||
set -e
|
||||
|
||||
# Configuration
|
||||
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
|
||||
WORKING_DIRECTORY="${SCRIPT_DIR}"
|
||||
CONTAINER_NAME="network-monitor-agent"
|
||||
REGISTRY="harbor.nymte.ch"
|
||||
NAMESPACE="nym"
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Get version from Cargo.toml
|
||||
VERSION=$(grep "^version = " "${WORKING_DIRECTORY}/Cargo.toml" | sed -E 's/version = "(.*)"/\1/')
|
||||
if [ -z "$VERSION" ]; then
|
||||
echo -e "${RED}Error: Could not extract version from Cargo.toml${NC}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${YELLOW}Building Network Monitor Agent${NC}"
|
||||
echo -e "${YELLOW}Version: ${VERSION}${NC}"
|
||||
|
||||
# Login to Harbor
|
||||
echo -e "${GREEN}Logging into Harbor...${NC}"
|
||||
docker login "${REGISTRY}"
|
||||
|
||||
# Build the container
|
||||
echo -e "${GREEN}Building the container...${NC}"
|
||||
# Build from repository root (two levels up from script location)
|
||||
docker build \
|
||||
--build-arg GIT_REF="${GATEWAY_PROBE_GIT_REF}" \
|
||||
-f "${WORKING_DIRECTORY}/Dockerfile" \
|
||||
"${SCRIPT_DIR}/../.." \
|
||||
-t "${REGISTRY}/${NAMESPACE}/${CONTAINER_NAME}:${VERSION}" \
|
||||
-t "${REGISTRY}/${NAMESPACE}/${CONTAINER_NAME}:latest"
|
||||
|
||||
# Push to Harbor
|
||||
echo -e "${GREEN}Pushing container to Harbor...${NC}"
|
||||
docker push "${REGISTRY}/${NAMESPACE}/${CONTAINER_NAME}:${VERSION}"
|
||||
docker push "${REGISTRY}/${NAMESPACE}/${CONTAINER_NAME}:latest"
|
||||
|
||||
echo -e "${GREEN}Successfully built and pushed ${CONTAINER_NAME}:${VERSION}${NC}"
|
||||
@@ -0,0 +1,20 @@
|
||||
#!/bin/sh
|
||||
|
||||
echo "Starting agent loop with sleep interval: ${SLEEP_TIME}s"
|
||||
|
||||
# Trap SIGTERM to allow graceful shutdown
|
||||
trap "echo 'Stopping...'; exit 0" SIGTERM
|
||||
|
||||
DEFAULT_ARGS="run-agent --orchestrator_address \"${NETWORK_MONITOR_AGENT_SERVER_ADDRESS}:${NETWORK_MONITOR_AGENT_SERVER_PORT}\" "
|
||||
ARGS=${NETWORK_MONITOR_AGENT_ARGS:-${DEFAULT_ARGS}}
|
||||
COMMAND="/nym/nym-network-monitor-agent ${ARGS}"
|
||||
|
||||
echo "default_args = '${DEFAULT_ARGS}'"
|
||||
echo "args = '${ARGS}'"
|
||||
echo "command = '${COMMAND}'"
|
||||
|
||||
# Run agent in an infinite loop
|
||||
while true; do
|
||||
eval "$COMMAND"
|
||||
sleep "$SLEEP_TIME"
|
||||
done
|
||||
@@ -0,0 +1,110 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use std::net::SocketAddr;
|
||||
use std::time::Duration;
|
||||
|
||||
/// Configuration for the [`NodeStressTester`], controlling packet sending behaviour during a test run.
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub(crate) struct NodeTesterConfig {
|
||||
/// How long the agent should be sending test packets with the specified rate.
|
||||
pub(crate) sending_duration: Duration,
|
||||
|
||||
/// How long the agent will wait to receive any leftover packets after finishing sending.
|
||||
pub(crate) waiting_duration: Duration,
|
||||
|
||||
/// How long the target node should delay the packet (i.e. the sphinx delay)
|
||||
pub(crate) packet_delay: Duration,
|
||||
|
||||
/// Timeout for establishing the egress connection to the node under test.
|
||||
pub(crate) egress_connection_timeout: Duration,
|
||||
|
||||
/// Timeout for the completing the noise handshake.
|
||||
pub(crate) noise_handshake_timeout: Duration,
|
||||
|
||||
/// Number of packets dispatched in a single batch. Together with `target_rate` this
|
||||
/// determines the inter-batch interval: `sending_batch_size / target_rate` seconds.
|
||||
pub(crate) sending_batch_size: usize,
|
||||
|
||||
/// Target rate of packets (per second) to be sent.
|
||||
pub(crate) target_rate: usize,
|
||||
|
||||
/// Whether the agent should reuse the same header for all packets, and consequently replay them.
|
||||
pub(crate) reuse_header: bool,
|
||||
|
||||
/// Local socket address the agent binds its mixnet listener on to receive returning packets.
|
||||
pub(crate) mixnet_bind_address: SocketAddr,
|
||||
|
||||
/// The mixnet address announced in the contract, where the tested nodes will send their packets to.
|
||||
pub(crate) external_mixnet_address: SocketAddr,
|
||||
}
|
||||
|
||||
impl NodeTesterConfig {
|
||||
/// Total number of packets the agent intends to send: `floor(target_rate * sending_duration)`.
|
||||
pub(crate) fn expected_packets(&self) -> usize {
|
||||
(self.target_rate as f32 * self.sending_duration.as_secs_f32()).floor() as usize
|
||||
}
|
||||
|
||||
/// Time between consecutive batch dispatches needed to sustain `target_rate`:
|
||||
/// `sending_batch_size / target_rate` seconds.
|
||||
pub(crate) fn batch_interval(&self) -> Duration {
|
||||
Duration::from_secs_f64(self.sending_batch_size as f64 / self.target_rate as f64)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use std::time::Duration;
|
||||
|
||||
fn config(
|
||||
target_rate: usize,
|
||||
sending_duration: Duration,
|
||||
batch_size: usize,
|
||||
) -> NodeTesterConfig {
|
||||
NodeTesterConfig {
|
||||
sending_duration,
|
||||
waiting_duration: Duration::from_secs(5),
|
||||
packet_delay: Duration::from_millis(50),
|
||||
egress_connection_timeout: Duration::from_secs(5),
|
||||
noise_handshake_timeout: Duration::from_secs(3),
|
||||
sending_batch_size: batch_size,
|
||||
target_rate,
|
||||
reuse_header: true,
|
||||
mixnet_bind_address: "127.0.0.1:1789".parse().unwrap(),
|
||||
external_mixnet_address: "127.0.0.1:1789".parse().unwrap(),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn expected_packets_floors_fractional_result() {
|
||||
// 1000 * 0.5s = 500.0 — exact, no rounding needed
|
||||
assert_eq!(
|
||||
config(1000, Duration::from_millis(500), 50).expected_packets(),
|
||||
500
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn expected_packets_floors_not_rounds() {
|
||||
// 1000 * 1.9s = 1900.0 exactly
|
||||
assert_eq!(
|
||||
config(1000, Duration::from_millis(1900), 50).expected_packets(),
|
||||
1900
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn batch_interval_is_batch_size_over_rate() {
|
||||
// 100 packets / 1000 pps = 100ms
|
||||
let interval = config(1000, Duration::from_secs(30), 100).batch_interval();
|
||||
assert_eq!(interval, Duration::from_millis(100));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn batch_interval_smaller_than_one_ms() {
|
||||
// 1 packet / 1000 pps = 1ms
|
||||
let interval = config(1000, Duration::from_secs(30), 1).batch_interval();
|
||||
assert_eq!(interval, Duration::from_millis(1));
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,19 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use anyhow::{Context, bail};
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_pemstore::load_key;
|
||||
use std::path::Path;
|
||||
use std::sync::Arc;
|
||||
|
||||
/// Loads an x25519 Noise private key from a PEM file and returns the full key pair
|
||||
/// wrapped in an [`Arc`] for shared ownership.
|
||||
pub(crate) fn load_noise_key<P: AsRef<Path>>(path: P) -> anyhow::Result<Arc<x25519::KeyPair>> {
|
||||
let path = path.as_ref();
|
||||
if !path.exists() {
|
||||
bail!("noise key file does not exist at: {}", path.display());
|
||||
}
|
||||
let noise_key: x25519::PrivateKey = load_key(path).context("failed to load noise key")?;
|
||||
Ok(Arc::new(noise_key.into()))
|
||||
}
|
||||
@@ -0,0 +1,110 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::agent::config::NodeTesterConfig;
|
||||
use crate::agent::tested_node::TestedNodeDetails;
|
||||
use crate::agent::tester::NodeStressTester;
|
||||
use anyhow::Context;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_network_monitor_orchestrator_requests::client::OrchestratorClient;
|
||||
use nym_network_monitor_orchestrator_requests::models::{
|
||||
AgentAnnounceRequest, TestRunAssignmentRequest, TestRunResultSubmissionRequest,
|
||||
};
|
||||
use nym_noise::LATEST_NOISE_VERSION;
|
||||
use std::sync::Arc;
|
||||
use tracing::info;
|
||||
|
||||
pub(crate) mod config;
|
||||
pub(crate) mod helpers;
|
||||
pub(crate) mod result;
|
||||
pub(crate) mod tested_node;
|
||||
pub(crate) mod tester;
|
||||
|
||||
/// A network monitor agent that receives test assignments from the orchestrator,
|
||||
/// stress-tests individual nym-nodes, and reports results back.
|
||||
pub(crate) struct NetworkMonitorAgent {
|
||||
/// Tester configuration controlling rates, timeouts, and addressing.
|
||||
tester_config: NodeTesterConfig,
|
||||
|
||||
/// Client used to communicate with the orchestrator API (port requests, announcements,
|
||||
/// work assignments, result submissions).
|
||||
orchestrator_client: OrchestratorClient,
|
||||
|
||||
/// The tester's own Noise key pair, used to authenticate the egress connection.
|
||||
noise_key: Arc<x25519::KeyPair>,
|
||||
}
|
||||
|
||||
impl NetworkMonitorAgent {
|
||||
/// Creates a new agent with the given tester configuration, pre-loaded noise key,
|
||||
/// and orchestrator client.
|
||||
pub(crate) fn new(
|
||||
tester_config: NodeTesterConfig,
|
||||
noise_key: Arc<x25519::KeyPair>,
|
||||
orchestrator_client: OrchestratorClient,
|
||||
) -> Self {
|
||||
NetworkMonitorAgent {
|
||||
tester_config,
|
||||
orchestrator_client,
|
||||
noise_key,
|
||||
}
|
||||
}
|
||||
|
||||
/// Announces this agent's details (mixnet address, noise key, protocol version)
|
||||
/// to the orchestrator so they can be registered in the smart contract.
|
||||
pub(crate) async fn announce_agent(&self) -> anyhow::Result<()> {
|
||||
self.orchestrator_client
|
||||
.announce_agent(&AgentAnnounceRequest {
|
||||
agent_mix_socket_address: self.tester_config.external_mixnet_address,
|
||||
x25519_noise_key: *self.noise_key.public_key(),
|
||||
// we're always using the latest noise version available
|
||||
noise_version: LATEST_NOISE_VERSION.into(),
|
||||
})
|
||||
.await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Requests a work assignment from the orchestrator and, if one is available,
|
||||
/// performs a stress test against the assigned node and submits the results.
|
||||
pub(crate) async fn run_stress_test(&self) -> anyhow::Result<()> {
|
||||
let request = TestRunAssignmentRequest {
|
||||
agent_mix_socket_address: self.tester_config.external_mixnet_address,
|
||||
x25519_noise_key: *self.noise_key.public_key(),
|
||||
};
|
||||
|
||||
// 1. query the orchestrator for a work assignment
|
||||
let Some(work_assignment) = self
|
||||
.orchestrator_client
|
||||
.request_work_assignment(&request)
|
||||
.await?
|
||||
.assignment
|
||||
else {
|
||||
// 2. if no work is available - exit immediately
|
||||
info!("no work available, exiting...");
|
||||
return Ok(());
|
||||
};
|
||||
|
||||
info!("retrieved the following work assignment: {work_assignment:?}");
|
||||
let node_id = work_assignment.node_id;
|
||||
|
||||
// 3. otherwise construct the tester and attempt to perform the measurements
|
||||
let tested_node = TestedNodeDetails::from_testrun_assignment(work_assignment);
|
||||
let mut stress_tester =
|
||||
NodeStressTester::new(self.tester_config, self.noise_key.clone(), tested_node)?;
|
||||
|
||||
// attempt to perform the measurements within the configured timeouts
|
||||
// note: the only errors we're possibly exiting on are critical failures like
|
||||
// theoretically impossible sphinx packet creations or failing to join on tasks.
|
||||
// any sending/receiving errors are included as part of an `Ok(result)` response.
|
||||
let result = stress_tester.run_stress_test().await?;
|
||||
|
||||
// 4. after that has concluded - submit the results back to the orchestrator
|
||||
self.orchestrator_client
|
||||
.submit_test_run_result(&TestRunResultSubmissionRequest {
|
||||
node_id,
|
||||
result: result.into(),
|
||||
})
|
||||
.await
|
||||
.context("failed to submit test run result")?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,381 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::egress_connection::EgressConnectionStatistics;
|
||||
use std::time::Duration;
|
||||
use time::OffsetDateTime;
|
||||
|
||||
// TODO: once created, move this struct to a shared models library
|
||||
/// Captures the outcome of a single [`run_stress_test`](super::NodeStressTester::run_stress_test) run.
|
||||
///
|
||||
/// Fields are populated incrementally as the test progresses; absent values (`None`) indicate
|
||||
/// that the corresponding step was not reached or did not produce a result.
|
||||
#[derive(Debug, Clone)]
|
||||
pub(crate) struct TestRunResult {
|
||||
/// The timestamp when the test run was initiated.
|
||||
pub(crate) start_time: OffsetDateTime,
|
||||
|
||||
/// Duration of the Noise handshake on the ingress (responder) side, if completed.
|
||||
pub(crate) ingress_noise_handshake: Option<Duration>,
|
||||
|
||||
/// Duration of the Noise handshake on the egress (initiator) side, if completed.
|
||||
pub(crate) egress_noise_handshake: Option<Duration>,
|
||||
|
||||
/// The (constant) delay of the sphinx packet set during the test run.
|
||||
pub sphinx_packet_delay: Duration,
|
||||
|
||||
/// Number of sphinx packets successfully sent to the node under test.
|
||||
pub(crate) packets_sent: usize,
|
||||
|
||||
/// Number of sphinx packets returned by the node and successfully received.
|
||||
pub(crate) packets_received: usize,
|
||||
|
||||
/// Round-trip time of the very first probe packet, sent in isolation before any load is applied.
|
||||
/// Because the node is idle at this point, this value approximates the baseline network latency
|
||||
/// to the node without any queuing or processing overhead from the stress test itself.
|
||||
/// `None` if the initial probe did not complete successfully.
|
||||
pub(crate) approximate_latency: Option<Duration>,
|
||||
|
||||
/// RTT statistics computed over all received packets, or `None` if no packets were received.
|
||||
pub(crate) packets_statistics: Option<LatencyDistribution>,
|
||||
|
||||
/// Latency distribution of individual batch send operations recorded during the load test.
|
||||
/// Reflects how long each batch took to flush to the OS socket, giving a rough measure of
|
||||
/// egress throughput. `None` if no batches were sent.
|
||||
pub(crate) sending_statistics: Option<LatencyDistribution>,
|
||||
|
||||
/// Whether any packet was received with an ID that had already been seen in this test run.
|
||||
/// Duplicates should never occur under normal operation; their presence may indicate a
|
||||
/// misbehaving or malicious node replaying packets.
|
||||
pub(crate) received_duplicates: bool,
|
||||
|
||||
/// Human-readable description of the first error that caused the test to abort if any.
|
||||
pub(crate) error: Option<String>,
|
||||
}
|
||||
|
||||
impl TestRunResult {
|
||||
pub(crate) fn new(sphinx_packet_delay: Duration) -> Self {
|
||||
TestRunResult {
|
||||
start_time: OffsetDateTime::now_utc(),
|
||||
ingress_noise_handshake: None,
|
||||
egress_noise_handshake: None,
|
||||
sphinx_packet_delay,
|
||||
packets_sent: 0,
|
||||
packets_received: 0,
|
||||
approximate_latency: None,
|
||||
packets_statistics: None,
|
||||
sending_statistics: None,
|
||||
received_duplicates: false,
|
||||
error: None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Calculates the percentage of packets received out of the total sent.
|
||||
pub(crate) fn received_percentage(&self) -> f64 {
|
||||
if self.packets_sent > 0 {
|
||||
(self.packets_received as f64 / self.packets_sent as f64) * 100.0
|
||||
} else {
|
||||
0.0
|
||||
}
|
||||
}
|
||||
|
||||
/// Records the duration of the ingress Noise handshake.
|
||||
pub(crate) fn set_ingress_noise_handshake(&mut self, duration: Duration) {
|
||||
self.ingress_noise_handshake = Some(duration);
|
||||
}
|
||||
|
||||
/// Records the duration of the egress Noise handshake.
|
||||
pub(crate) fn set_egress_noise_handshake(&mut self, duration: Duration) {
|
||||
self.egress_noise_handshake = Some(duration);
|
||||
}
|
||||
|
||||
/// Records the RTT of the initial probe packet as the baseline latency estimate.
|
||||
pub(crate) fn set_approximate_latency(&mut self, rtt: Duration) {
|
||||
self.approximate_latency = Some(rtt);
|
||||
}
|
||||
|
||||
/// Sets the number of packets that were sent during the stress test.
|
||||
pub(crate) fn set_packets_sent(&mut self, count: usize) {
|
||||
self.packets_sent = count;
|
||||
}
|
||||
|
||||
/// Sets the number of packets that were received back from the node under test.
|
||||
pub(crate) fn set_packets_received(&mut self, count: usize) {
|
||||
self.packets_received = count;
|
||||
}
|
||||
|
||||
/// Attaches pre-computed RTT statistics for the received packets.
|
||||
pub(crate) fn set_packets_statistics(&mut self, stats: LatencyDistribution) {
|
||||
self.packets_statistics = Some(stats);
|
||||
}
|
||||
|
||||
/// Marks that at least one duplicate packet ID was observed during the test run.
|
||||
pub(crate) fn set_received_duplicates(&mut self) {
|
||||
self.received_duplicates = true;
|
||||
}
|
||||
|
||||
/// Records an error message that caused the test run to abort.
|
||||
pub(crate) fn set_error(&mut self, error: impl Into<String>) {
|
||||
self.error = Some(error.into());
|
||||
}
|
||||
|
||||
/// Populates egress-side statistics from the finished [`EgressConnection`](crate::egress_connection::EgressConnection).
|
||||
/// Sets the egress Noise handshake duration and, if any batches were sent, the batch send
|
||||
/// latency distribution.
|
||||
pub(crate) fn set_egress_connection_statistics(&mut self, stats: EgressConnectionStatistics) {
|
||||
self.set_egress_noise_handshake(stats.noise_handshake_duration);
|
||||
|
||||
if !stats.packet_batches_sending_duration.is_empty() {
|
||||
self.sending_statistics = Some(LatencyDistribution::compute(
|
||||
&stats.packet_batches_sending_duration,
|
||||
))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Latency statistics computed over the set of test packets received or sent during a stress test.
|
||||
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
|
||||
pub struct LatencyDistribution {
|
||||
/// Minimum latency duration it took to send or receive a test packet.
|
||||
pub minimum: Duration,
|
||||
|
||||
/// Average latency duration it took to send or receive a test packet.
|
||||
pub mean: Duration,
|
||||
|
||||
/// Median latency duration it took to send or receive a test packet.
|
||||
/// For an even number of samples, this is the arithmetic mean of the two middle values.
|
||||
pub median: Duration,
|
||||
|
||||
/// Maximum latency duration it took to send or receive a test packet.
|
||||
pub maximum: Duration,
|
||||
|
||||
/// The standard deviation of the latency duration it took to send or receive the test packets.
|
||||
pub standard_deviation: Duration,
|
||||
}
|
||||
|
||||
impl LatencyDistribution {
|
||||
/// Computes statistics from a slice of per-packet RTT durations.
|
||||
/// Returns zeroed statistics if `raw_results` is empty.
|
||||
pub fn compute(raw_results: &[Duration]) -> Self {
|
||||
if raw_results.is_empty() {
|
||||
return LatencyDistribution {
|
||||
minimum: Duration::ZERO,
|
||||
mean: Duration::ZERO,
|
||||
median: Duration::ZERO,
|
||||
maximum: Duration::ZERO,
|
||||
standard_deviation: Duration::ZERO,
|
||||
};
|
||||
}
|
||||
|
||||
let mut sorted = raw_results.to_vec();
|
||||
sorted.sort();
|
||||
|
||||
let minimum = sorted[0];
|
||||
|
||||
// SAFETY: we have ensured our list is not empty
|
||||
#[allow(clippy::unwrap_used)]
|
||||
let maximum = *sorted.last().unwrap();
|
||||
let median = Self::duration_median(&sorted);
|
||||
let mean = Self::duration_mean(&sorted);
|
||||
let standard_deviation = Self::duration_standard_deviation(&sorted, mean);
|
||||
|
||||
LatencyDistribution {
|
||||
minimum,
|
||||
mean,
|
||||
median,
|
||||
maximum,
|
||||
standard_deviation,
|
||||
}
|
||||
}
|
||||
|
||||
/// Computes the median of an already-sorted slice of durations.
|
||||
/// For an even count, returns the arithmetic mean of the two middle elements.
|
||||
/// Caller must ensure `sorted` is non-empty and ordered ascending.
|
||||
fn duration_median(sorted: &[Duration]) -> Duration {
|
||||
let len = sorted.len();
|
||||
let mid = len / 2;
|
||||
if len % 2 == 1 {
|
||||
sorted[mid]
|
||||
} else {
|
||||
(sorted[mid - 1] + sorted[mid]) / 2
|
||||
}
|
||||
}
|
||||
|
||||
/// Computes the arithmetic mean of a slice of durations.
|
||||
/// Returns [`Duration::ZERO`] for an empty slice.
|
||||
fn duration_mean(data: &[Duration]) -> Duration {
|
||||
if data.is_empty() {
|
||||
return Default::default();
|
||||
}
|
||||
|
||||
let sum = data.iter().sum::<Duration>();
|
||||
// packet counts realistically fit in a u32; a test sending 4 billion packets would
|
||||
// have other problems first
|
||||
let count = data.len() as u32;
|
||||
|
||||
sum / count
|
||||
}
|
||||
|
||||
/// Computes the population standard deviation (divides by N, not N-1) of the RTT durations.
|
||||
/// Precision is truncated to microseconds, which is sufficient for network latency.
|
||||
fn duration_standard_deviation(data: &[Duration], mean: Duration) -> Duration {
|
||||
if data.is_empty() {
|
||||
return Default::default();
|
||||
}
|
||||
|
||||
let variance_micros = data
|
||||
.iter()
|
||||
.map(|&value| {
|
||||
let diff = mean.abs_diff(value);
|
||||
// truncate to microseconds — nanosecond precision is noise for network RTTs
|
||||
let diff_micros = diff.as_micros();
|
||||
diff_micros * diff_micros
|
||||
})
|
||||
.sum::<u128>()
|
||||
/ data.len() as u128;
|
||||
|
||||
// u128 easily holds squared microsecond values for any realistic RTT (< thousands of seconds)
|
||||
let std_deviation_micros = (variance_micros as f64).sqrt() as u64;
|
||||
Duration::from_micros(std_deviation_micros)
|
||||
}
|
||||
}
|
||||
|
||||
impl From<LatencyDistribution>
|
||||
for nym_network_monitor_orchestrator_requests::models::LatencyDistribution
|
||||
{
|
||||
fn from(value: LatencyDistribution) -> Self {
|
||||
Self {
|
||||
minimum: value.minimum,
|
||||
mean: value.mean,
|
||||
median: value.median,
|
||||
maximum: value.maximum,
|
||||
standard_deviation: value.standard_deviation,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<TestRunResult> for nym_network_monitor_orchestrator_requests::models::TestRunResult {
|
||||
fn from(value: TestRunResult) -> Self {
|
||||
Self {
|
||||
time_taken: (OffsetDateTime::now_utc() - value.start_time).unsigned_abs(),
|
||||
ingress_noise_handshake: value.ingress_noise_handshake,
|
||||
egress_noise_handshake: value.egress_noise_handshake,
|
||||
sphinx_packet_delay: value.sphinx_packet_delay,
|
||||
packets_sent: value.packets_sent,
|
||||
packets_received: value.packets_received,
|
||||
approximate_latency: value.approximate_latency,
|
||||
packets_statistics: value.packets_statistics.map(Into::into),
|
||||
sending_statistics: value.sending_statistics.map(Into::into),
|
||||
received_duplicates: value.received_duplicates,
|
||||
error: value.error,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn ms(n: u64) -> Duration {
|
||||
Duration::from_millis(n)
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn empty_slice_gives_zero_stats() {
|
||||
let stats = LatencyDistribution::compute(&[]);
|
||||
assert_eq!(stats.minimum, Duration::ZERO);
|
||||
assert_eq!(stats.maximum, Duration::ZERO);
|
||||
assert_eq!(stats.mean, Duration::ZERO);
|
||||
assert_eq!(stats.median, Duration::ZERO);
|
||||
assert_eq!(stats.standard_deviation, Duration::ZERO);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn single_value_has_zero_deviation() {
|
||||
let stats = LatencyDistribution::compute(&[ms(42)]);
|
||||
assert_eq!(stats.minimum, ms(42));
|
||||
assert_eq!(stats.maximum, ms(42));
|
||||
assert_eq!(stats.mean, ms(42));
|
||||
assert_eq!(stats.median, ms(42));
|
||||
assert_eq!(stats.standard_deviation, Duration::ZERO);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn two_equal_values_have_zero_deviation() {
|
||||
let stats = LatencyDistribution::compute(&[ms(10), ms(10)]);
|
||||
assert_eq!(stats.mean, ms(10));
|
||||
assert_eq!(stats.median, ms(10));
|
||||
assert_eq!(stats.standard_deviation, Duration::ZERO);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn median_odd_count_picks_middle() {
|
||||
// sorted: 10, 20, 30, 40, 50 -> median = 30
|
||||
let data = [ms(40), ms(10), ms(50), ms(20), ms(30)];
|
||||
let stats = LatencyDistribution::compute(&data);
|
||||
assert_eq!(stats.median, ms(30));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn median_even_count_averages_two_middle() {
|
||||
// sorted: 10, 20, 30, 40 -> median = (20 + 30) / 2 = 25
|
||||
let data = [ms(30), ms(10), ms(40), ms(20)];
|
||||
let stats = LatencyDistribution::compute(&data);
|
||||
assert_eq!(stats.median, ms(25));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn min_max_are_correct() {
|
||||
let data = [ms(30), ms(10), ms(50), ms(20)];
|
||||
let stats = LatencyDistribution::compute(&data);
|
||||
assert_eq!(stats.minimum, ms(10));
|
||||
assert_eq!(stats.maximum, ms(50));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn mean_is_correct() {
|
||||
// mean of 10, 20, 30, 40 = 25 ms
|
||||
let data = [ms(10), ms(20), ms(30), ms(40)];
|
||||
let stats = LatencyDistribution::compute(&data);
|
||||
assert_eq!(stats.mean, ms(25));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn standard_deviation_known_values() {
|
||||
// population std-dev of {10, 20, 30, 40} ms:
|
||||
// mean = 25, deviations = {-15, -5, 5, 15}
|
||||
// variance = (225 + 25 + 25 + 225) / 4 = 125
|
||||
// std-dev = sqrt(125) ≈ 11.180 ms → truncated to microseconds = 11180 µs
|
||||
let data = [ms(10), ms(20), ms(30), ms(40)];
|
||||
let stats = LatencyDistribution::compute(&data);
|
||||
let expected = Duration::from_micros(11180);
|
||||
// allow ±1 µs for floating-point rounding
|
||||
let diff = stats.standard_deviation.abs_diff(expected);
|
||||
assert!(
|
||||
diff <= Duration::from_micros(1),
|
||||
"std-dev {:.3?} not within 1µs of expected {:.3?}",
|
||||
stats.standard_deviation,
|
||||
expected
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn result_setters_populate_fields() {
|
||||
let mut result = TestRunResult::new(ms(2));
|
||||
result.set_ingress_noise_handshake(ms(5));
|
||||
result.set_egress_noise_handshake(ms(7));
|
||||
result.set_packets_sent(100);
|
||||
result.set_packets_received(95);
|
||||
result.set_error("timeout");
|
||||
|
||||
let stats = LatencyDistribution::compute(&[ms(10), ms(20)]);
|
||||
result.set_packets_statistics(stats);
|
||||
|
||||
assert_eq!(result.ingress_noise_handshake, Some(ms(5)));
|
||||
assert_eq!(result.egress_noise_handshake, Some(ms(7)));
|
||||
assert_eq!(result.packets_sent, 100);
|
||||
assert_eq!(result.packets_received, 95);
|
||||
assert_eq!(result.packets_statistics, Some(stats));
|
||||
assert_eq!(result.error.as_deref(), Some("timeout"));
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,53 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::sphinx_helpers::as_sphinx_node;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_network_monitor_orchestrator_requests::models::TestRunAssignment;
|
||||
use nym_noise::config::{NoiseNode, NoiseVersion, VersionedNoiseKeyV1};
|
||||
use nym_sphinx_params::SphinxKeyRotation;
|
||||
use std::net::SocketAddr;
|
||||
|
||||
/// Identity and addressing information for the node being tested in a stress-test run.
|
||||
#[derive(Debug)]
|
||||
pub(crate) struct TestedNodeDetails {
|
||||
pub(crate) node_id: Option<u32>,
|
||||
|
||||
/// TCP socket address of the node's mixnet listener, used for the egress connection.
|
||||
pub(crate) address: SocketAddr,
|
||||
|
||||
/// Node's static Noise public key, used to authenticate and encrypt the egress connection.
|
||||
pub(crate) noise_key: x25519::PublicKey,
|
||||
|
||||
/// Key rotation associated with the current sphinx key of the node.
|
||||
pub(crate) key_rotation: SphinxKeyRotation,
|
||||
|
||||
/// Node's current sphinx public key, used to build the sphinx packet header.
|
||||
pub(crate) sphinx_key: x25519::PublicKey,
|
||||
}
|
||||
|
||||
impl TestedNodeDetails {
|
||||
pub(crate) fn from_testrun_assignment(assignment: TestRunAssignment) -> Self {
|
||||
TestedNodeDetails {
|
||||
node_id: Some(assignment.node_id),
|
||||
address: assignment.node_address,
|
||||
noise_key: assignment.noise_key,
|
||||
key_rotation: SphinxKeyRotation::from_key_rotation_id(assignment.key_rotation_id),
|
||||
sphinx_key: assignment.sphinx_key,
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns a sphinx [`Node`](nym_sphinx_types::Node) representation of this node,
|
||||
/// suitable for use as a hop in a sphinx route.
|
||||
pub(crate) fn as_sphinx_node(&self) -> nym_sphinx_types::Node {
|
||||
as_sphinx_node(self.address, self.sphinx_key)
|
||||
}
|
||||
|
||||
/// Returns a [`NoiseNode`] representation of this node for use in the Noise network view.
|
||||
pub(crate) fn as_noise_node(&self) -> NoiseNode {
|
||||
NoiseNode::new_nym_node(VersionedNoiseKeyV1 {
|
||||
supported_version: NoiseVersion::V1,
|
||||
x25519_pubkey: self.noise_key,
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,521 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::agent::config::NodeTesterConfig;
|
||||
use crate::agent::result::{LatencyDistribution, TestRunResult};
|
||||
use crate::agent::tested_node::TestedNodeDetails;
|
||||
use crate::egress_connection::EgressConnection;
|
||||
use crate::listener::MixnetListener;
|
||||
use crate::listener::received::MixnetPacketsSender;
|
||||
use crate::processor::{MixnetPacketProcessor, ProcessedPacket};
|
||||
use crate::sphinx_helpers::{
|
||||
as_sphinx_node, build_test_sphinx_packet, create_test_sphinx_packet_header,
|
||||
};
|
||||
use crate::test_packet::{TestPacketContent, TestPacketHeader};
|
||||
use anyhow::Context;
|
||||
use humantime::format_duration;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_noise::config::{NoiseConfig, NoiseNetworkView};
|
||||
use nym_sphinx_types::SphinxPacket;
|
||||
use nym_task::ShutdownToken;
|
||||
use rand::rngs::OsRng;
|
||||
use std::collections::HashMap;
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use tokio::pin;
|
||||
use tokio::sync::Notify;
|
||||
use tokio::time::{Instant, sleep};
|
||||
use tracing::{debug, error, info, warn};
|
||||
|
||||
/// The core component responsible for executing a stress-test run against a single node.
|
||||
///
|
||||
/// A test run proceeds in five ordered steps (see [`run_stress_test`](Self::run_stress_test)):
|
||||
///
|
||||
/// 1. Establish an outbound (egress) Noise-encrypted TCP connection to the node.
|
||||
/// 2. Bind a local TCP listener (ingress) that receives sphinx packets the node sends back.
|
||||
/// 3. Send a single probe packet to verify basic connectivity and record baseline latency.
|
||||
/// 4. Replay the same packet (when `reuse_header` is enabled) to confirm the node's
|
||||
/// bloomfilter bypass is correctly configured.
|
||||
/// 5. Send packets at the configured rate for the configured duration, then collect and
|
||||
/// summarise the results.
|
||||
///
|
||||
/// Only critical failures (e.g. failing to bind a port) are returned as
|
||||
/// `Err`; node-level failures (e.g. the node not responding) are captured inside the
|
||||
/// returned [`TestRunResult`] so the caller can still inspect partial data.
|
||||
pub(crate) struct NodeStressTester {
|
||||
/// Tester configuration controlling rates, timeouts, and addressing.
|
||||
config: NodeTesterConfig,
|
||||
|
||||
/// Monotonically increasing counter embedded in each outgoing packet as its ID.
|
||||
packet_counter: u64,
|
||||
|
||||
/// Pre-built sphinx packet header reused across all packets when `config.reuse_header`
|
||||
/// is set. Allows the node's bloomfilter bypass to be exercised. `None` means a fresh
|
||||
/// header is built for every packet.
|
||||
reusable_test_header: Option<TestPacketHeader>,
|
||||
|
||||
/// The tester's own Noise key pair, used to authenticate the egress connection.
|
||||
noise_key: Arc<x25519::KeyPair>,
|
||||
|
||||
/// An ephemeral sphinx key pair generated at construction time. Used both to build the
|
||||
/// return-route sphinx header (so packets come back to this tester) and to decrypt
|
||||
/// returning packets when `reuse_header` is disabled.
|
||||
sphinx_key: Arc<x25519::KeyPair>,
|
||||
|
||||
/// Identity and addressing information for the node being tested.
|
||||
tested_node: TestedNodeDetails,
|
||||
}
|
||||
|
||||
impl NodeStressTester {
|
||||
/// Creates a new tester, loading the Noise private key from `noise_key_path` and
|
||||
/// generating a fresh ephemeral sphinx key. If `config.reuse_header` is set, the
|
||||
/// sphinx packet header is pre-built here so it can be reused across all test packets.
|
||||
pub(crate) fn new(
|
||||
config: NodeTesterConfig,
|
||||
noise_key: Arc<x25519::KeyPair>,
|
||||
tested_node: TestedNodeDetails,
|
||||
) -> anyhow::Result<Self> {
|
||||
debug!("using the following tester config");
|
||||
debug!("{config:#?}");
|
||||
|
||||
debug!("testing the following node");
|
||||
debug!("{tested_node:#?}");
|
||||
|
||||
let sphinx_key = x25519::PrivateKey::new(&mut OsRng);
|
||||
|
||||
let reusable_test_header = if config.reuse_header {
|
||||
debug!("reusing sphinx header for tests");
|
||||
// Route: tested node → this agent (so packets come back to us).
|
||||
let route = [
|
||||
tested_node.as_sphinx_node(),
|
||||
as_sphinx_node(config.external_mixnet_address, sphinx_key.public_key()),
|
||||
];
|
||||
let delay = config.packet_delay;
|
||||
Some(create_test_sphinx_packet_header(route, delay)?)
|
||||
} else {
|
||||
debug!("new sphinx header will be generated for each new test packet");
|
||||
None
|
||||
};
|
||||
|
||||
Ok(Self {
|
||||
config,
|
||||
packet_counter: 0,
|
||||
reusable_test_header,
|
||||
noise_key,
|
||||
sphinx_key: Arc::new(sphinx_key.into()),
|
||||
tested_node,
|
||||
})
|
||||
}
|
||||
|
||||
/// Opens the outbound Noise-encrypted TCP connection to the node under test.
|
||||
async fn establish_egress_connection(&self) -> anyhow::Result<EgressConnection> {
|
||||
EgressConnection::establish(
|
||||
self.tested_node.address,
|
||||
self.config.egress_connection_timeout,
|
||||
self.tested_node.key_rotation,
|
||||
&self.noise_config(),
|
||||
)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Constructs the [`MixnetPacketProcessor`] used to decode and time-stamp returning packets.
|
||||
/// When a reusable header is available it is used for decryption; otherwise the tester's
|
||||
/// sphinx private key is used directly.
|
||||
fn build_packet_processor(&self) -> MixnetPacketProcessor {
|
||||
let packet_recovery = match &self.reusable_test_header {
|
||||
Some(header) => header.clone().into(),
|
||||
None => self.sphinx_key.clone().into(),
|
||||
};
|
||||
MixnetPacketProcessor::new(packet_recovery, self.config.waiting_duration)
|
||||
}
|
||||
|
||||
/// Binds the local TCP listener and wraps it in a [`MixnetListener`] that will forward
|
||||
/// decoded packets to `received_sender`.
|
||||
async fn build_mixnet_listener(
|
||||
&self,
|
||||
received_sender: MixnetPacketsSender,
|
||||
shutdown_token: ShutdownToken,
|
||||
) -> anyhow::Result<MixnetListener> {
|
||||
MixnetListener::new(
|
||||
self.config.mixnet_bind_address,
|
||||
self.tested_node.address,
|
||||
self.noise_config(),
|
||||
received_sender,
|
||||
shutdown_token.clone(),
|
||||
)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Builds a [`NoiseConfig`] that contains the default configuration for the protocol
|
||||
/// and the key associated with the tested node to accept its connection.
|
||||
fn noise_config(&self) -> NoiseConfig {
|
||||
let mut nodes = HashMap::new();
|
||||
nodes.insert(
|
||||
self.tested_node.address.ip(),
|
||||
self.tested_node.as_noise_node(),
|
||||
);
|
||||
let network = NoiseNetworkView::new(nodes);
|
||||
|
||||
NoiseConfig::new(
|
||||
self.noise_key.clone(),
|
||||
network,
|
||||
self.config.noise_handshake_timeout,
|
||||
)
|
||||
}
|
||||
|
||||
/// Returns a sphinx node representation of this tester's own mixnet listener address,
|
||||
/// used as the final hop in the packet route so packets are delivered back here.
|
||||
fn as_sphinx_node(&self) -> nym_sphinx_types::Node {
|
||||
as_sphinx_node(
|
||||
self.config.external_mixnet_address,
|
||||
*self.sphinx_key.public_key(),
|
||||
)
|
||||
}
|
||||
|
||||
/// Builds the next test sphinx packet, incrementing the internal packet counter.
|
||||
/// Reuses the pre-built header when available; otherwise builds a fresh header and
|
||||
/// encrypts it with a new sphinx key each time.
|
||||
fn create_test_sphinx_packet(&mut self) -> anyhow::Result<SphinxPacket> {
|
||||
let content = TestPacketContent::new(self.packet_counter);
|
||||
self.packet_counter += 1;
|
||||
|
||||
match &self.reusable_test_header {
|
||||
Some(header) => header.create_test_packet(content),
|
||||
None => {
|
||||
let route = [self.tested_node.as_sphinx_node(), self.as_sphinx_node()];
|
||||
build_test_sphinx_packet(
|
||||
&route,
|
||||
self.config.packet_delay,
|
||||
None,
|
||||
&content.to_bytes(),
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Builds a batch of `batch_size` test sphinx packets with consecutive IDs.
|
||||
fn create_packet_batch(&mut self, batch_size: usize) -> anyhow::Result<Vec<SphinxPacket>> {
|
||||
let mut packets = Vec::with_capacity(batch_size);
|
||||
for _ in 0..batch_size {
|
||||
let packet = self.create_test_sphinx_packet()?;
|
||||
packets.push(packet);
|
||||
}
|
||||
Ok(packets)
|
||||
}
|
||||
|
||||
/// Computes the network latency for a received packet by subtracting the configured
|
||||
/// sphinx delay from its measured round-trip time.
|
||||
fn packet_latency(&self, received: ProcessedPacket) -> Duration {
|
||||
received.rtt - self.config.packet_delay
|
||||
}
|
||||
|
||||
/// Creates and sends a single test sphinx packet over `egress`.
|
||||
/// On send failure, records an error on `result` and returns `false`.
|
||||
async fn send_test_packet(
|
||||
&mut self,
|
||||
egress: &mut EgressConnection,
|
||||
result: &mut TestRunResult,
|
||||
) -> anyhow::Result<bool> {
|
||||
let packet = self
|
||||
.create_test_sphinx_packet()
|
||||
.context("sphinx packet creation failure!")?;
|
||||
if let Err(err) = egress.send_packet(packet).await {
|
||||
result.set_error(format!("{:#}", err.context("failed to send test packet")));
|
||||
return Ok(false);
|
||||
};
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// Creates and sends a batch of `batch_size` test packets over `egress`.
|
||||
/// On send failure, records an error on `result` and returns `false`.
|
||||
async fn send_test_packet_batch(
|
||||
&mut self,
|
||||
batch_size: usize,
|
||||
egress: &mut EgressConnection,
|
||||
result: &mut TestRunResult,
|
||||
) -> anyhow::Result<bool> {
|
||||
let batch = self
|
||||
.create_packet_batch(batch_size)
|
||||
.context("sphinx packet batch creation failure!")?;
|
||||
|
||||
if let Err(err) = egress.send_packet_batch(batch).await {
|
||||
result.set_error(format!("{:#}", err.context("failed to send test packet")));
|
||||
return Ok(false);
|
||||
};
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// Sends a single packet and waits for it to come back.
|
||||
/// On success, sets `approximate_latency` on the result and returns `true`.
|
||||
/// On failure, sets an error on the result and returns `false` (caller should abort).
|
||||
async fn send_connectivity_probe(
|
||||
&mut self,
|
||||
egress: &mut EgressConnection,
|
||||
processor: &mut MixnetPacketProcessor,
|
||||
result: &mut TestRunResult,
|
||||
) -> anyhow::Result<bool> {
|
||||
if !self.send_test_packet(egress, result).await? {
|
||||
return Ok(false);
|
||||
}
|
||||
|
||||
match processor.next_packet().await {
|
||||
Ok(res) => {
|
||||
result.set_approximate_latency(self.packet_latency(res));
|
||||
Ok(true)
|
||||
}
|
||||
Err(err) => {
|
||||
result.set_error(format!(
|
||||
"{:#}",
|
||||
err.context("failed to receive a valid initial packet back")
|
||||
));
|
||||
Ok(false)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Replays a packet to verify that the node's bloomfilter bypass is correctly configured.
|
||||
/// Returns `true` if the packet was returned, `false` if the node failed the check (caller should abort).
|
||||
/// Should only be called when `config.reuse_header` is set.
|
||||
async fn send_bloomfilter_probe(
|
||||
&mut self,
|
||||
egress: &mut EgressConnection,
|
||||
processor: &mut MixnetPacketProcessor,
|
||||
result: &mut TestRunResult,
|
||||
) -> anyhow::Result<bool> {
|
||||
info!("repeating the packet to check bloomfilter bypass configuration");
|
||||
if !self.send_test_packet(egress, result).await? {
|
||||
return Ok(false);
|
||||
}
|
||||
|
||||
match processor.next_packet().await {
|
||||
Ok(res) => {
|
||||
info!("received {res}");
|
||||
Ok(true)
|
||||
}
|
||||
Err(err) => {
|
||||
result.set_error(format!(
|
||||
"{:#}",
|
||||
err.context("failed to receive a valid secondary packet back - the node might not have a working chain subscriber (or the agent might be misconfigured)"))
|
||||
);
|
||||
Ok(false)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Sends packets at the configured rate for the configured duration.
|
||||
/// Dispatches one batch every `batch_interval` seconds; if the egress falls behind,
|
||||
/// ticks are delayed rather than bunched up to avoid unintended bursts.
|
||||
/// Updates `result.packets_sent` after every batch and returns `false` on send failure.
|
||||
async fn send_load_test(
|
||||
&mut self,
|
||||
egress: &mut EgressConnection,
|
||||
result: &mut TestRunResult,
|
||||
) -> anyhow::Result<bool> {
|
||||
// one batch every (sending_batch_size / target_rate) seconds keeps us at the target rate
|
||||
let batch_interval = self.config.batch_interval();
|
||||
let mut interval = tokio::time::interval(batch_interval);
|
||||
// if we fall behind, don't try to catch up with burst sends
|
||||
interval.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Delay);
|
||||
|
||||
let start = Instant::now();
|
||||
let mut sent = 0;
|
||||
let total_packets = self.config.expected_packets();
|
||||
|
||||
loop {
|
||||
if start.elapsed() >= self.config.sending_duration {
|
||||
break;
|
||||
}
|
||||
if sent >= total_packets {
|
||||
break;
|
||||
}
|
||||
interval.tick().await;
|
||||
|
||||
// the last batch may be smaller than other batches
|
||||
let remaining = total_packets - sent;
|
||||
let batch_size = self.config.sending_batch_size.min(remaining);
|
||||
if !self
|
||||
.send_test_packet_batch(batch_size, egress, result)
|
||||
.await?
|
||||
{
|
||||
return Ok(false);
|
||||
}
|
||||
|
||||
sent += batch_size;
|
||||
// update send count after each batch so partial results are visible on early exit
|
||||
result.set_packets_sent(sent);
|
||||
}
|
||||
|
||||
if sent < total_packets {
|
||||
warn!(
|
||||
"did not manage to send all required packets within the sending window. sent {sent}/{total_packets}"
|
||||
);
|
||||
}
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// Drains all received packets from `processor` (waiting up to `waiting_duration` for
|
||||
/// stragglers), deduplicates by ID, computes RTT statistics, and populates `result`.
|
||||
async fn collect_test_results(
|
||||
&self,
|
||||
processor: &mut MixnetPacketProcessor,
|
||||
result: &mut TestRunResult,
|
||||
) {
|
||||
// drain whatever arrived immediately, then wait for stragglers
|
||||
let mut received = processor.all_available();
|
||||
if received.len() < result.packets_sent {
|
||||
let deadline = sleep(self.config.waiting_duration);
|
||||
pin!(deadline);
|
||||
loop {
|
||||
tokio::select! {
|
||||
_ = &mut deadline => break,
|
||||
next = processor.next_packet() => {
|
||||
received.push(next);
|
||||
if received.len() >= result.packets_sent {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// deduplicate by packet ID; duplicates indicate possible node misbehaviour
|
||||
let mut valid_received = HashMap::new();
|
||||
for packet in received {
|
||||
let Ok(packet) = packet else {
|
||||
debug!("received packet was malformed");
|
||||
continue;
|
||||
};
|
||||
if valid_received.insert(packet.id, packet).is_some() {
|
||||
error!(
|
||||
"‼️ received duplicate packet for id {} - something nasty is going on!",
|
||||
packet.id
|
||||
);
|
||||
result.set_received_duplicates();
|
||||
}
|
||||
}
|
||||
|
||||
let latencies = valid_received
|
||||
.values()
|
||||
.map(|p| self.packet_latency(*p))
|
||||
.collect::<Vec<_>>();
|
||||
|
||||
let received_count = valid_received.len();
|
||||
result.set_packets_received(received_count);
|
||||
result.set_packets_statistics(LatencyDistribution::compute(&latencies));
|
||||
|
||||
debug!(
|
||||
sent = result.packets_sent,
|
||||
received = received_count,
|
||||
recv_pct = format!("{:.1}%", result.received_percentage()),
|
||||
"load test complete"
|
||||
);
|
||||
}
|
||||
|
||||
/// Runs a full stress-test against the configured node and returns the collected results.
|
||||
///
|
||||
/// Only returns `Err` for critical failures (e.g. unable to bind the listener
|
||||
/// port). Node-level failures (no response, bloomfilter misconfiguration, etc.) are
|
||||
/// recorded inside the returned [`TestRunResult`] so the caller always gets partial data.
|
||||
pub(crate) async fn run_stress_test(&mut self) -> anyhow::Result<TestRunResult> {
|
||||
let node_address = self.tested_node.address;
|
||||
if let Some(node_id) = self.tested_node.node_id {
|
||||
info!("beginning stress test of node {node_id} ({node_address})",);
|
||||
} else {
|
||||
info!("beginning stress test of node {node_address}",);
|
||||
}
|
||||
|
||||
let mut result = TestRunResult::new(self.config.packet_delay);
|
||||
|
||||
// 1. establish the egress connection — abort immediately if it fails
|
||||
debug!("attempting to establish egress connection to the tested node");
|
||||
let mut egress = match self.establish_egress_connection().await {
|
||||
Ok(conn) => conn,
|
||||
Err(err) => {
|
||||
result.set_error(format!(
|
||||
"{:#}",
|
||||
err.context("failed to establish egress node connection")
|
||||
));
|
||||
return Ok(result);
|
||||
}
|
||||
};
|
||||
|
||||
// 2. spawn the mixnet packet listener that forwards received packets to the processor
|
||||
debug!(
|
||||
"creating mixnet listener on {}",
|
||||
self.config.mixnet_bind_address
|
||||
);
|
||||
let mut processor = self.build_packet_processor();
|
||||
let shutdown_token = ShutdownToken::new();
|
||||
let listener = self
|
||||
.build_mixnet_listener(processor.sender(), shutdown_token.clone())
|
||||
.await?;
|
||||
let listener_on_start = Arc::new(Notify::new());
|
||||
let listener_on_start_clone = listener_on_start.clone();
|
||||
|
||||
let listener_join =
|
||||
tokio::spawn(async move { listener.run(listener_on_start_clone).await });
|
||||
|
||||
// wait for the listener task to properly begin
|
||||
listener_on_start.notified().await;
|
||||
|
||||
// 3. probe: send a single packet to confirm the node responds
|
||||
debug!("sending initial node connectivity probe");
|
||||
if !self
|
||||
.send_connectivity_probe(&mut egress, &mut processor, &mut result)
|
||||
.await?
|
||||
{
|
||||
shutdown_token.cancel();
|
||||
let _ = listener_join.await?;
|
||||
return Ok(result);
|
||||
}
|
||||
|
||||
// 4. probe: replay the packet to verify bloomfilter bypass is configured
|
||||
debug!("sending bloomfilter probe");
|
||||
if self.config.reuse_header
|
||||
&& !self
|
||||
.send_bloomfilter_probe(&mut egress, &mut processor, &mut result)
|
||||
.await?
|
||||
{
|
||||
shutdown_token.cancel();
|
||||
let mixnet_listener = listener_join.await?;
|
||||
let ingress_noise = mixnet_listener
|
||||
.last_noise_handshake_duration
|
||||
.context("missing ingress noise duration after completing entire test run!")?;
|
||||
|
||||
result.set_ingress_noise_handshake(ingress_noise);
|
||||
result.set_egress_connection_statistics(egress.connection_statistics);
|
||||
return Ok(result);
|
||||
}
|
||||
|
||||
// 5. stress test: send packets at the target rate for the configured duration
|
||||
debug!(
|
||||
"beginning the proper load testing. going to send at rate {}/s for {}",
|
||||
self.config.target_rate,
|
||||
format_duration(self.config.sending_duration)
|
||||
);
|
||||
self.send_load_test(&mut egress, &mut result).await?;
|
||||
|
||||
// 6. collect and summarise results
|
||||
debug!("waiting for final packets to arrive");
|
||||
self.collect_test_results(&mut processor, &mut result).await;
|
||||
|
||||
// 7. shut down the listener and harvest its stats
|
||||
debug!("shutting down the mixnet listener and finishing the test");
|
||||
shutdown_token.cancel();
|
||||
let mixnet_listener = listener_join.await?;
|
||||
let ingress_noise = mixnet_listener
|
||||
.last_noise_handshake_duration
|
||||
.context("missing ingress noise duration after completing entire test run!")?;
|
||||
|
||||
result.set_ingress_noise_handshake(ingress_noise);
|
||||
result.set_egress_connection_statistics(egress.connection_statistics);
|
||||
|
||||
if let Some(node_id) = self.tested_node.node_id {
|
||||
info!("finished stress test of node {node_id} ({node_address})",);
|
||||
} else {
|
||||
info!("finished stress test of node {node_address}",);
|
||||
}
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,15 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use nym_bin_common::bin_info_owned;
|
||||
use nym_bin_common::output_format::OutputFormat;
|
||||
|
||||
#[derive(clap::Args, Debug)]
|
||||
pub(crate) struct Args {
|
||||
#[clap(short, long, default_value_t = OutputFormat::default())]
|
||||
output: OutputFormat,
|
||||
}
|
||||
|
||||
pub(crate) fn execute(args: Args) {
|
||||
println!("{}", args.output.format(&bin_info_owned!()))
|
||||
}
|
||||
@@ -0,0 +1,88 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use super::env::vars::*;
|
||||
use crate::agent::config::NodeTesterConfig;
|
||||
use anyhow::bail;
|
||||
use std::net::SocketAddr;
|
||||
use std::num::NonZeroUsize;
|
||||
use std::time::Duration;
|
||||
|
||||
#[derive(clap::Args, Debug)]
|
||||
pub(crate) struct CommonArgs {
|
||||
/// Specifies for how long the agent should be sending test packets with the specified rate.
|
||||
#[arg(long, value_parser = humantime::parse_duration, default_value = "30s", env = NYM_NETWORK_MONITOR_AGENT_SENDING_DURATION_ARG)]
|
||||
sending_duration: Duration,
|
||||
|
||||
/// Specifies how long the agent will wait to receive any leftover packets after finishing sending.
|
||||
#[arg(long, value_parser = humantime::parse_duration, default_value = "5s", env = NYM_NETWORK_MONITOR_AGENT_WAITING_DURATION_ARG)]
|
||||
waiting_duration: Duration,
|
||||
|
||||
/// How long the node itself should delay the packet
|
||||
/// It shouldn't be set to zero as otherwise the node will not put the packet through
|
||||
/// its delay queue and we would not test the entire pipeline
|
||||
#[arg(long, value_parser = humantime::parse_duration, default_value = "50ms", env = NYM_NETWORK_MONITOR_AGENT_PACKET_DELAY_ARG)]
|
||||
packet_delay: Duration,
|
||||
|
||||
/// Specifies the target rate of packets (per second) to be sent.
|
||||
#[arg(long, default_value = "1000", env = NYM_NETWORK_MONITOR_AGENT_TARGET_RATE_ARG)]
|
||||
target_rate: NonZeroUsize,
|
||||
|
||||
/// Specifies whether the agent should reuse the same header for all packets.
|
||||
/// And consequently replay them
|
||||
#[arg(long, short, default_value = "true", env = NYM_NETWORK_MONITOR_AGENT_REUSE_HEADER_ARG)]
|
||||
reuse_header: bool,
|
||||
|
||||
/// Timeout for establishing the TCP connection to the node under test.
|
||||
#[arg(long, value_parser = humantime::parse_duration, default_value = "5s", env = NYM_NETWORK_MONITOR_AGENT_EGRESS_CONNECTION_TIMEOUT_ARG)]
|
||||
egress_connection_timeout: Duration,
|
||||
|
||||
/// Timeout for completing the Noise handshake with the node under test.
|
||||
#[arg(long, value_parser = humantime::parse_duration, default_value = "3s", env = NYM_NETWORK_MONITOR_AGENT_NOISE_HANDSHAKE_TIMEOUT_ARG)]
|
||||
noise_handshake_timeout: Duration,
|
||||
|
||||
/// Number of packets sent in a single batch. Together with `target_rate` this controls
|
||||
/// how frequently batches are dispatched: one batch every `sending_batch_size / target_rate` seconds.
|
||||
#[arg(long, default_value = "50", env = NYM_NETWORK_MONITOR_AGENT_SENDING_BATCH_SIZE_ARG)]
|
||||
sending_batch_size: NonZeroUsize,
|
||||
|
||||
/// Specifies the path to the noise key file used for establishing tunnel with the node being tested
|
||||
#[arg(long, env = NYM_NETWORK_MONITOR_AGENT_NOISE_KEY_PATH_ARG)]
|
||||
pub(crate) noise_key_path: String,
|
||||
|
||||
/// Specifies the socket address the agent will bind to for receiving mixnet traffic.
|
||||
#[arg(long, env = NYM_NETWORK_MONITOR_AGENT_BIND_ADDRESS_ARG, default_value = "[::]:9000")]
|
||||
bind_address: SocketAddr,
|
||||
}
|
||||
|
||||
impl CommonArgs {
|
||||
/// Constructs a [`NodeTesterConfig`] from the common CLI arguments.
|
||||
/// `mixnet_address` is provided separately as it is command-specific.
|
||||
pub(crate) fn build_config(
|
||||
&self,
|
||||
external_address: SocketAddr,
|
||||
) -> anyhow::Result<NodeTesterConfig> {
|
||||
if self.sending_duration.is_zero() {
|
||||
bail!("attempted to set sending duration to 0s")
|
||||
}
|
||||
if self.egress_connection_timeout.is_zero() {
|
||||
bail!("attempted to set egress connection timeout to 0s")
|
||||
}
|
||||
if self.noise_handshake_timeout.is_zero() {
|
||||
bail!("attempted to set noise handshake timeout to 0s")
|
||||
}
|
||||
|
||||
Ok(NodeTesterConfig {
|
||||
sending_duration: self.sending_duration,
|
||||
waiting_duration: self.waiting_duration,
|
||||
packet_delay: self.packet_delay,
|
||||
egress_connection_timeout: self.egress_connection_timeout,
|
||||
noise_handshake_timeout: self.noise_handshake_timeout,
|
||||
sending_batch_size: self.sending_batch_size.get(),
|
||||
target_rate: self.target_rate.get(),
|
||||
reuse_header: self.reuse_header,
|
||||
mixnet_bind_address: self.bind_address,
|
||||
external_mixnet_address: external_address,
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,48 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
/// Environment variable names used as fallbacks for CLI arguments.
|
||||
/// Each constant matches the `env = ...` attribute on the corresponding clap field.
|
||||
pub mod vars {
|
||||
// common args
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_SENDING_DURATION_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_SENDING_DURATION";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_WAITING_DURATION_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_WAITING_DURATION";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_TARGET_RATE_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_TARGET_RATE";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_REUSE_HEADER_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_REUSE_HEADER";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_NOISE_KEY_PATH_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_NOISE_KEY_PATH";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_PACKET_DELAY_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_PACKET_DELAY";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_EGRESS_CONNECTION_TIMEOUT_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_EGRESS_CONNECTION_TIMEOUT";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_NOISE_HANDSHAKE_TIMEOUT_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_NOISE_HANDSHAKE_TIMEOUT";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_SENDING_BATCH_SIZE_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_SENDING_BATCH_SIZE";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_BIND_ADDRESS_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_BIND_ADDRESS";
|
||||
|
||||
// run agent args
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_ORCHESTRATOR_ADDRESS_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_ORCHESTRATOR_ADDRESS";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_ORCHESTRATOR_TOKEN_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_ORCHESTRATOR_TOKEN";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_HOST_IP_ARG: &str = "NYM_NETWORK_MONITOR_AGENT_HOST_IP";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_HOST_PORT_ARG: &str = "NYM_NETWORK_MONITOR_AGENT_HOST_PORT";
|
||||
|
||||
// test node args
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_MIXNET_ADDRESS_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_MIXNET_ADDRESS";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_NODE_ADDRESS_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_NODE_ADDRESS";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_NODE_NOISE_KEY_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_NODE_NOISE_KEY";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_NODE_SPHINX_KEY_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_NODE_SPHINX_KEY";
|
||||
pub const NYM_NETWORK_MONITOR_AGENT_NODE_SPHINX_KEY_ROTATION_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_AGENT_NODE_SPHINX_KEY_ROTATION";
|
||||
}
|
||||
@@ -0,0 +1,24 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use super::env::vars::*;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use tracing::info;
|
||||
|
||||
/// Arguments for the `keygen` subcommand.
|
||||
#[derive(clap::Args, Debug)]
|
||||
pub(crate) struct Args {
|
||||
/// Specifies the path to the noise key file used for establishing tunnel with the node being tested
|
||||
#[arg(long, env = NYM_NETWORK_MONITOR_AGENT_NOISE_KEY_PATH_ARG)]
|
||||
noise_key_path: String,
|
||||
}
|
||||
|
||||
/// Generates a fresh x25519 Noise private key and writes it to the path specified in `args`.
|
||||
pub(crate) fn execute(args: Args) -> anyhow::Result<()> {
|
||||
let mut rng = rand::thread_rng();
|
||||
let noise_key = x25519::PrivateKey::new(&mut rng);
|
||||
|
||||
nym_pemstore::store_key(&noise_key, &args.noise_key_path)?;
|
||||
info!("noise key written to '{}'", args.noise_key_path);
|
||||
Ok(())
|
||||
}
|
||||
@@ -0,0 +1,56 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use clap::{Parser, Subcommand};
|
||||
use nym_bin_common::bin_info;
|
||||
use std::sync::OnceLock;
|
||||
|
||||
mod build_info;
|
||||
mod common;
|
||||
mod env;
|
||||
mod keygen;
|
||||
mod run_agent;
|
||||
mod test_node;
|
||||
|
||||
// Helper for passing LONG_VERSION to clap
|
||||
fn pretty_build_info_static() -> &'static str {
|
||||
static PRETTY_BUILD_INFORMATION: OnceLock<String> = OnceLock::new();
|
||||
PRETTY_BUILD_INFORMATION.get_or_init(|| bin_info!().pretty_print())
|
||||
}
|
||||
|
||||
/// Top-level CLI entry point for the network monitor agent.
|
||||
#[derive(Parser, Debug)]
|
||||
#[clap(author = "Nymtech", version, long_version = pretty_build_info_static(), about)]
|
||||
pub(crate) struct Cli {
|
||||
#[command(subcommand)]
|
||||
pub(crate) command: Command,
|
||||
}
|
||||
|
||||
impl Cli {
|
||||
/// Dispatches execution to the subcommand selected by the user.
|
||||
pub(crate) async fn execute(self) -> anyhow::Result<()> {
|
||||
match self.command {
|
||||
Command::BuildInfo(args) => build_info::execute(args),
|
||||
Command::TestNode(args) => test_node::execute(args).await?,
|
||||
Command::RunAgent(args) => run_agent::execute(args).await?,
|
||||
Command::Keygen(args) => keygen::execute(args)?,
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Subcommand, Debug)]
|
||||
pub(crate) enum Command {
|
||||
/// Show build information of this binary
|
||||
BuildInfo(build_info::Args),
|
||||
|
||||
/// One-shot manual testing of a specified node
|
||||
/// without interacting with the orchestrator.
|
||||
TestNode(test_node::Args),
|
||||
|
||||
/// Test a node by contacting the orchestrator for the work assignment
|
||||
RunAgent(run_agent::Args),
|
||||
|
||||
/// Generate all required keys for the agent to work
|
||||
Keygen(keygen::Args),
|
||||
}
|
||||
@@ -0,0 +1,63 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use super::env::vars::*;
|
||||
use crate::agent::NetworkMonitorAgent;
|
||||
use crate::agent::helpers::load_noise_key;
|
||||
use crate::cli::common::CommonArgs;
|
||||
use nym_network_monitor_orchestrator_requests::client::OrchestratorClient;
|
||||
use std::net::{IpAddr, SocketAddr};
|
||||
use tracing::info;
|
||||
use url::Url;
|
||||
|
||||
#[derive(clap::Args, Debug)]
|
||||
pub(crate) struct Args {
|
||||
#[clap(flatten)]
|
||||
common_args: CommonArgs,
|
||||
|
||||
/// Address of the orchestrator for requesting work assignments
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_AGENT_ORCHESTRATOR_ADDRESS_ARG)]
|
||||
orchestrator_address: Url,
|
||||
|
||||
/// Bearer token required for requesting work assignments
|
||||
/// and submitting the results
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_AGENT_ORCHESTRATOR_TOKEN_ARG)]
|
||||
orchestrator_token: String,
|
||||
|
||||
/// Egress IP address of this agent, retrieved from status.hostIP via the Downward API
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_AGENT_HOST_IP_ARG)]
|
||||
host_ip: IpAddr,
|
||||
|
||||
/// Announced port of this agent, used alongside host_ip by nodes sending packets back to the agent
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_AGENT_HOST_PORT_ARG)]
|
||||
host_port: u16,
|
||||
}
|
||||
|
||||
pub(crate) async fn execute(args: Args) -> anyhow::Result<()> {
|
||||
let orchestrator_client =
|
||||
OrchestratorClient::new(args.orchestrator_address.into(), args.orchestrator_token)?;
|
||||
|
||||
let noise_key = load_noise_key(&args.common_args.noise_key_path)?;
|
||||
|
||||
let external_address = SocketAddr::new(args.host_ip, args.host_port);
|
||||
|
||||
// 1. build instance of the agent (loads the noise keys)
|
||||
let agent = NetworkMonitorAgent::new(
|
||||
args.common_args.build_config(external_address)?,
|
||||
noise_key,
|
||||
orchestrator_client,
|
||||
);
|
||||
|
||||
// 2. announce the agent to the orchestrator
|
||||
// so that it would be registered in the smart contract
|
||||
// (if it hasn't been announced before)
|
||||
info!("announcing agent information to the orchestrator");
|
||||
agent.announce_agent().await?;
|
||||
|
||||
// 3. query the orchestrator for work assignment and attempt to perform the stress test
|
||||
// of the target node
|
||||
info!("attempting to request test run assignment");
|
||||
agent.run_stress_test().await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -0,0 +1,77 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use super::env::vars::*;
|
||||
use crate::agent::config::NodeTesterConfig;
|
||||
use crate::agent::helpers::load_noise_key;
|
||||
use crate::agent::tested_node::TestedNodeDetails;
|
||||
use crate::agent::tester::NodeStressTester;
|
||||
use crate::cli::common::CommonArgs;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_sphinx_params::SphinxKeyRotation;
|
||||
use std::net::SocketAddr;
|
||||
use tracing::info;
|
||||
|
||||
/// Arguments for the `test-node` subcommand.
|
||||
#[derive(clap::Args, Debug)]
|
||||
pub(crate) struct Args {
|
||||
#[clap(flatten)]
|
||||
common_args: CommonArgs,
|
||||
|
||||
/// The socket address of the agent to use for receiving test packets back
|
||||
#[arg(long, env = NYM_NETWORK_MONITOR_AGENT_MIXNET_ADDRESS_ARG)]
|
||||
agent_mixnet_listener: SocketAddr,
|
||||
|
||||
/// The socket address of the node to test
|
||||
#[arg(long, env = NYM_NETWORK_MONITOR_AGENT_NODE_ADDRESS_ARG)]
|
||||
tested_node_address: SocketAddr,
|
||||
|
||||
/// Noise key of the node to test
|
||||
#[arg(long, env = NYM_NETWORK_MONITOR_AGENT_NODE_NOISE_KEY_ARG)]
|
||||
tested_node_noise_key: x25519::PublicKey,
|
||||
|
||||
/// Sphinx key of the node to test
|
||||
#[arg(long, env = NYM_NETWORK_MONITOR_AGENT_NODE_SPHINX_KEY_ARG)]
|
||||
tested_node_sphinx_key: x25519::PublicKey,
|
||||
|
||||
/// Current sphinx key rotation of the node to test
|
||||
#[arg(long, env = NYM_NETWORK_MONITOR_AGENT_NODE_SPHINX_KEY_ROTATION_ARG)]
|
||||
tested_node_sphinx_key_rotation: u32,
|
||||
}
|
||||
|
||||
impl Args {
|
||||
/// Builds the agent [`NodeTesterConfig`] from the flattened common args and the local mixnet listener address.
|
||||
pub(crate) fn build_tester_config(&self) -> anyhow::Result<NodeTesterConfig> {
|
||||
self.common_args.build_config(self.agent_mixnet_listener)
|
||||
}
|
||||
|
||||
/// Builds the [`TestedNodeDetails`] from the node address and key arguments.
|
||||
pub(crate) fn build_tested_node_details(&self) -> TestedNodeDetails {
|
||||
TestedNodeDetails {
|
||||
node_id: None,
|
||||
address: self.tested_node_address,
|
||||
noise_key: self.tested_node_noise_key,
|
||||
sphinx_key: self.tested_node_sphinx_key,
|
||||
key_rotation: SphinxKeyRotation::from_key_rotation_id(
|
||||
self.tested_node_sphinx_key_rotation,
|
||||
),
|
||||
}
|
||||
}
|
||||
|
||||
/// Constructs a fully initialised [`NodeStressTester`] from the parsed arguments.
|
||||
pub(crate) fn build_stress_tester(&self) -> anyhow::Result<NodeStressTester> {
|
||||
NodeStressTester::new(
|
||||
self.build_tester_config()?,
|
||||
load_noise_key(&self.common_args.noise_key_path)?,
|
||||
self.build_tested_node_details(),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/// Runs a one-shot stress test against the specified node and logs the result.
|
||||
pub(crate) async fn execute(args: Args) -> anyhow::Result<()> {
|
||||
let result = args.build_stress_tester()?.run_stress_test().await?;
|
||||
|
||||
info!("{result:#?}");
|
||||
Ok(())
|
||||
}
|
||||
@@ -0,0 +1,123 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use anyhow::bail;
|
||||
use futures::{SinkExt, stream};
|
||||
use humantime::format_duration;
|
||||
use nym_noise::config::NoiseConfig;
|
||||
use nym_noise::connection::Connection;
|
||||
use nym_noise::upgrade_noise_initiator;
|
||||
use nym_sphinx_framing::codec::NymCodec;
|
||||
use nym_sphinx_framing::packet::FramedNymPacket;
|
||||
use nym_sphinx_params::{PacketType, SphinxKeyRotation};
|
||||
use nym_sphinx_types::{NymPacket, SphinxPacket};
|
||||
use std::net::SocketAddr;
|
||||
use tokio::net::TcpStream;
|
||||
use tokio::time::{Instant, timeout};
|
||||
use tokio_util::codec::Framed;
|
||||
use tracing::{error, info, trace};
|
||||
|
||||
/// Timing statistics collected over the lifetime of an [`EgressConnection`].
|
||||
pub(crate) struct EgressConnectionStatistics {
|
||||
/// Duration of the Noise handshake performed when the connection was established.
|
||||
pub(crate) noise_handshake_duration: std::time::Duration,
|
||||
|
||||
/// Per-batch send durations, one entry for each call to [`send_packet_batch`](EgressConnection::send_packet_batch).
|
||||
pub(crate) packet_batches_sending_duration: Vec<std::time::Duration>,
|
||||
}
|
||||
|
||||
/// An outbound, noise-encrypted TCP connection to the node under test used for sending sphinx packets.
|
||||
pub(crate) struct EgressConnection {
|
||||
/// Timing statistics accumulated while the connection is active.
|
||||
pub(crate) connection_statistics: EgressConnectionStatistics,
|
||||
|
||||
/// The key rotation at the time of starting the agent.
|
||||
key_rotation: SphinxKeyRotation,
|
||||
|
||||
/// The noise-encrypted, framed TCP stream used to send sphinx packets.
|
||||
mixnet_connection: Framed<Connection<TcpStream>, NymCodec>,
|
||||
}
|
||||
|
||||
impl EgressConnection {
|
||||
/// Opens a TCP connection to `address`, performs the Noise handshake as the initiator,
|
||||
/// and returns a ready-to-use [`EgressConnection`].
|
||||
/// Fails if the TCP connect or Noise upgrade exceeds timeout.
|
||||
pub(crate) async fn establish(
|
||||
address: SocketAddr,
|
||||
timeout_duration: std::time::Duration,
|
||||
key_rotation: SphinxKeyRotation,
|
||||
noise_config: &NoiseConfig,
|
||||
) -> anyhow::Result<Self> {
|
||||
info!("attempting to establish connection to {address}");
|
||||
let stream = timeout(timeout_duration, TcpStream::connect(address)).await??;
|
||||
|
||||
info!("beginning the noise handshake (initiator)");
|
||||
|
||||
let noise_handshake_start = Instant::now();
|
||||
let noise_stream = upgrade_noise_initiator(stream, noise_config).await?;
|
||||
|
||||
if !noise_stream.is_noise() {
|
||||
error!(
|
||||
"failed to upgrade the connection to noise with {address}. does the node support the protocol?"
|
||||
);
|
||||
bail!("egress connection failure");
|
||||
}
|
||||
|
||||
let noise_handshake_duration = noise_handshake_start.elapsed();
|
||||
info!(
|
||||
"noise handshake with {address} completed in {}",
|
||||
format_duration(noise_handshake_duration)
|
||||
);
|
||||
|
||||
Ok(Self {
|
||||
connection_statistics: EgressConnectionStatistics {
|
||||
noise_handshake_duration,
|
||||
packet_batches_sending_duration: vec![],
|
||||
},
|
||||
key_rotation,
|
||||
mixnet_connection: Framed::new(noise_stream, NymCodec),
|
||||
})
|
||||
}
|
||||
|
||||
/// Sends a single sphinx packet and records the send duration in [`EgressConnectionStatistics`].
|
||||
pub(crate) async fn send_packet(&mut self, packet: SphinxPacket) -> anyhow::Result<()> {
|
||||
self.mixnet_connection
|
||||
.send(FramedNymPacket::new(
|
||||
NymPacket::Sphinx(packet),
|
||||
PacketType::Mix,
|
||||
self.key_rotation,
|
||||
false,
|
||||
))
|
||||
.await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Sends a batch of sphinx packets in one flushed write and records the total batch send duration.
|
||||
pub(crate) async fn send_packet_batch(
|
||||
&mut self,
|
||||
packets: Vec<SphinxPacket>,
|
||||
) -> anyhow::Result<()> {
|
||||
let count = packets.len();
|
||||
let send_start = Instant::now();
|
||||
self.mixnet_connection
|
||||
.send_all(&mut stream::iter(packets.into_iter().map(|p| {
|
||||
Ok(FramedNymPacket::new(
|
||||
NymPacket::Sphinx(p),
|
||||
PacketType::Mix,
|
||||
self.key_rotation,
|
||||
false,
|
||||
))
|
||||
})))
|
||||
.await?;
|
||||
let elapsed = send_start.elapsed();
|
||||
self.connection_statistics
|
||||
.packet_batches_sending_duration
|
||||
.push(elapsed);
|
||||
trace!(
|
||||
"sent batch of {count} packets in {}",
|
||||
format_duration(elapsed)
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,167 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::listener::received::{MixnetPacketsSender, ReceivedPacket};
|
||||
use futures::StreamExt;
|
||||
use nym_noise::config::NoiseConfig;
|
||||
use nym_noise::connection::Connection;
|
||||
use nym_noise::upgrade_noise_responder;
|
||||
use nym_sphinx_framing::codec::NymCodec;
|
||||
use nym_task::ShutdownToken;
|
||||
use std::net::SocketAddr;
|
||||
use std::sync::Arc;
|
||||
use tokio::net::TcpStream;
|
||||
use tokio::sync::Notify;
|
||||
use tokio::time::Instant;
|
||||
use tokio_util::codec::Framed;
|
||||
use tracing::{error, info, warn};
|
||||
|
||||
pub(crate) mod received;
|
||||
|
||||
/// Listens for inbound sphinx packets returned by the node under test.
|
||||
///
|
||||
/// Binds a TCP listener on `bind_address`, accepts a single connection at a time,
|
||||
/// performs a Noise handshake as the responder, then forwards every decoded
|
||||
/// [`NymPacket`] to the [`receiver`](received) via `received_packets_sender`.
|
||||
/// Connections from any address other than `tested_node_address` are rejected.
|
||||
pub(crate) struct MixnetListener {
|
||||
/// Local TCP listener.
|
||||
tcp_listener: tokio::net::TcpListener,
|
||||
|
||||
/// Address of the node being tested; connections from any other source are rejected.
|
||||
tested_node_address: SocketAddr,
|
||||
|
||||
/// Noise protocol configuration used when upgrading incoming TCP connections.
|
||||
noise_config: NoiseConfig,
|
||||
|
||||
/// Channel used to forward received packets to the [`PacketReceiver`](received).
|
||||
received_packets_sender: MixnetPacketsSender,
|
||||
|
||||
/// Duration it took to complete the last Noise handshake as the responder.
|
||||
pub(crate) last_noise_handshake_duration: Option<std::time::Duration>,
|
||||
|
||||
/// Global shutdown token
|
||||
shutdown: ShutdownToken,
|
||||
}
|
||||
|
||||
impl MixnetListener {
|
||||
/// Creates a new [`MixnetListener`] ready to be started with [`run`](Self::run).
|
||||
pub(crate) async fn new(
|
||||
bind_address: SocketAddr,
|
||||
tested_node_address: SocketAddr,
|
||||
noise_config: NoiseConfig,
|
||||
received_packets_sender: MixnetPacketsSender,
|
||||
shutdown: ShutdownToken,
|
||||
) -> anyhow::Result<Self> {
|
||||
info!("attempting to run mixnet listener on {bind_address}");
|
||||
|
||||
let tcp_listener = tokio::net::TcpListener::bind(bind_address)
|
||||
.await
|
||||
.inspect_err(|err| {
|
||||
error!("Failed to the mixnet listener bind to {bind_address}: {err}")
|
||||
})?;
|
||||
|
||||
Ok(Self {
|
||||
tcp_listener,
|
||||
tested_node_address,
|
||||
noise_config,
|
||||
received_packets_sender,
|
||||
last_noise_handshake_duration: None,
|
||||
shutdown,
|
||||
})
|
||||
}
|
||||
|
||||
/// Reads sphinx packets from an established, noise-encrypted stream and forwards
|
||||
/// each one to the receiver until the connection is closed or an error occurs.
|
||||
async fn handle_stream(&self, mut mixnet_connection: Framed<Connection<TcpStream>, NymCodec>) {
|
||||
loop {
|
||||
tokio::select! {
|
||||
biased;
|
||||
_ = self.shutdown.cancelled() => {
|
||||
tracing::debug!("mixnet listener: received shutdown");
|
||||
return
|
||||
}
|
||||
next_packet = mixnet_connection.next() => {
|
||||
let next_packet = match next_packet {
|
||||
None => {
|
||||
info!("mixnet connection closed");
|
||||
return;
|
||||
}
|
||||
Some(Ok(packet)) => packet,
|
||||
Some(Err(err)) => {
|
||||
error!("failed to read a packet from the mixnet connection: {err}");
|
||||
return;
|
||||
}
|
||||
};
|
||||
if self
|
||||
.received_packets_sender
|
||||
.unbounded_send(ReceivedPacket::new(next_packet))
|
||||
.is_err()
|
||||
{
|
||||
warn!("mixnet packet receiver has shut down - is the agent still running?");
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Validates the source address, performs the Noise handshake, then delegates to
|
||||
/// [`handle_stream`](Self::handle_stream) for the lifetime of the connection.
|
||||
async fn handle_connection(&mut self, (socket, source): (TcpStream, SocketAddr)) {
|
||||
if source.ip() != self.tested_node_address.ip() {
|
||||
warn!(
|
||||
"received a connection from a source that's not the node being tested. Ignoring it. Source: {source}, tested node: {}",
|
||||
self.tested_node_address
|
||||
);
|
||||
return;
|
||||
}
|
||||
info!("accepted connection from {source}. beginning the noise handshake (responder)");
|
||||
|
||||
let noise_handshake_start = Instant::now();
|
||||
let noise_stream = match upgrade_noise_responder(socket, &self.noise_config).await {
|
||||
Ok(noise_stream) => noise_stream,
|
||||
Err(err) => {
|
||||
error!("failed to upgrade the connection to noise with {source}: {err}");
|
||||
return;
|
||||
}
|
||||
};
|
||||
let noise_handshake_duration = noise_handshake_start.elapsed();
|
||||
|
||||
if !noise_stream.is_noise() {
|
||||
error!(
|
||||
"failed to upgrade the connection to noise with {source}. does the node support the protocol?"
|
||||
);
|
||||
return;
|
||||
}
|
||||
self.last_noise_handshake_duration = Some(noise_handshake_duration);
|
||||
|
||||
self.handle_stream(Framed::new(noise_stream, NymCodec))
|
||||
.await
|
||||
}
|
||||
|
||||
/// Processes one connection at a time until the shutdown token is cancelled.
|
||||
/// Returns `self` so that the caller can inspect fields such as
|
||||
/// [`last_noise_handshake_duration`](Self::last_noise_handshake_duration) after the run.
|
||||
pub(crate) async fn run(mut self, on_start: Arc<Notify>) -> Self {
|
||||
on_start.notify_waiters();
|
||||
// only handle a single connection at once
|
||||
// (we don't need more than that)
|
||||
loop {
|
||||
tokio::select! {
|
||||
biased;
|
||||
_ = self.shutdown.cancelled() => {
|
||||
tracing::debug!("mixnet listener: received shutdown");
|
||||
return self
|
||||
}
|
||||
connection = self.tcp_listener.accept() => {
|
||||
if let Ok(connection) = connection {
|
||||
self.handle_connection(connection).await;
|
||||
} else {
|
||||
error!("failed to accept a TCP connection from the mixnet listener");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use futures::channel::mpsc::{UnboundedReceiver, UnboundedSender};
|
||||
use nym_sphinx_framing::packet::FramedNymPacket;
|
||||
use time::OffsetDateTime;
|
||||
|
||||
/// A sphinx packet received by the [`MixnetListener`](super::MixnetListener), bundled with its
|
||||
/// wall-clock arrival time.
|
||||
pub(crate) struct ReceivedPacket {
|
||||
/// UTC timestamp at which the packet was pulled off the stream.
|
||||
pub(crate) received_at: OffsetDateTime,
|
||||
|
||||
/// The decoded sphinx packet as delivered by the framed codec.
|
||||
pub(crate) received: FramedNymPacket,
|
||||
}
|
||||
|
||||
impl ReceivedPacket {
|
||||
/// Wraps `received` and stamps it with the current UTC time.
|
||||
pub(crate) fn new(received: FramedNymPacket) -> Self {
|
||||
Self {
|
||||
received_at: OffsetDateTime::now_utc(),
|
||||
received,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Sender half of the channel used to forward [`ReceivedPacket`]s from the listener to the processor.
|
||||
pub(crate) type MixnetPacketsSender = UnboundedSender<ReceivedPacket>;
|
||||
|
||||
/// Receiver half of the channel used to forward [`ReceivedPacket`]s from the listener to the processor.
|
||||
pub(crate) type MixnetPacketsReceiver = UnboundedReceiver<ReceivedPacket>;
|
||||
@@ -0,0 +1,47 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::cli::Cli;
|
||||
use clap::Parser;
|
||||
use nym_bin_common::logging::tracing_subscriber::layer::SubscriberExt;
|
||||
use nym_bin_common::logging::tracing_subscriber::util::SubscriberInitExt;
|
||||
use nym_bin_common::logging::{
|
||||
default_tracing_env_filter, default_tracing_fmt_layer, tracing_subscriber,
|
||||
};
|
||||
use tracing::info;
|
||||
|
||||
mod agent;
|
||||
pub(crate) mod cli;
|
||||
mod egress_connection;
|
||||
pub(crate) mod listener;
|
||||
mod processor;
|
||||
pub(crate) mod sphinx_helpers;
|
||||
pub(crate) mod test_packet;
|
||||
|
||||
fn setup_logger() -> anyhow::Result<()> {
|
||||
// crates that are more granularly filtered, regardless of default `RUST_LOG` value
|
||||
let filter_crates = ["reqwest", "hyper"];
|
||||
|
||||
let mut env_filter = default_tracing_env_filter();
|
||||
for crate_name in filter_crates {
|
||||
env_filter = env_filter.add_directive(format!("{crate_name}=warn").parse()?);
|
||||
}
|
||||
|
||||
tracing_subscriber::registry()
|
||||
.with(default_tracing_fmt_layer(std::io::stderr))
|
||||
.with(env_filter)
|
||||
.init();
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> anyhow::Result<()> {
|
||||
setup_logger()?;
|
||||
let cli = Cli::parse();
|
||||
|
||||
cli.execute().await?;
|
||||
|
||||
info!("network monitor agent is done - quitting");
|
||||
Ok(())
|
||||
}
|
||||
@@ -0,0 +1,165 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::listener::received::{MixnetPacketsReceiver, MixnetPacketsSender, ReceivedPacket};
|
||||
use crate::test_packet::{TestPacketContent, TestPacketHeader};
|
||||
use anyhow::{Context, bail};
|
||||
use futures::StreamExt;
|
||||
use futures::channel::mpsc::unbounded;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_sphinx_types::{ProcessedPacketData, SphinxPacket};
|
||||
use std::fmt::Display;
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use tokio::time::timeout;
|
||||
use tracing::{debug, warn};
|
||||
|
||||
/// A decoded test packet together with its measured round-trip time.
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub(crate) struct ProcessedPacket {
|
||||
/// The packet ID copied from the embedded [`TestPacketContent`].
|
||||
pub(crate) id: u64,
|
||||
|
||||
/// Round-trip time measured from when the packet was created to when it was received.
|
||||
/// This includes both the sphinx delay and the network transit time; callers should
|
||||
/// subtract `config.packet_delay` to obtain the network-only latency.
|
||||
pub(crate) rtt: Duration,
|
||||
}
|
||||
|
||||
impl Display for ProcessedPacket {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(f, "{}: {}", self.id, humantime::format_duration(self.rtt))
|
||||
}
|
||||
}
|
||||
|
||||
/// Strategy used to decrypt a returning sphinx packet and extract its [`TestPacketContent`].
|
||||
///
|
||||
/// When the agent operates with a reusable header it already holds the payload key, so
|
||||
/// only the payload needs unwrapping. When it builds a fresh header per-packet the full
|
||||
/// sphinx processing path (DH + decryption) must be performed using the agent's private key.
|
||||
pub(crate) enum PayloadRecovery {
|
||||
/// The agent holds a pre-built [`TestPacketHeader`] whose payload key can be used to
|
||||
/// unwrap the payload directly, skipping the full sphinx processing step.
|
||||
ReusableHeader(TestPacketHeader),
|
||||
|
||||
/// The agent must perform full sphinx processing using its private key to decrypt
|
||||
/// the payload, as no pre-built header is available.
|
||||
FullProcessing(Arc<x25519::KeyPair>),
|
||||
}
|
||||
|
||||
impl From<TestPacketHeader> for PayloadRecovery {
|
||||
fn from(header: TestPacketHeader) -> Self {
|
||||
PayloadRecovery::ReusableHeader(header)
|
||||
}
|
||||
}
|
||||
|
||||
impl From<Arc<x25519::KeyPair>> for PayloadRecovery {
|
||||
fn from(private_key: Arc<x25519::KeyPair>) -> Self {
|
||||
PayloadRecovery::FullProcessing(private_key)
|
||||
}
|
||||
}
|
||||
|
||||
impl PayloadRecovery {
|
||||
/// Decrypts `received` and deserialises its payload into a [`TestPacketContent`].
|
||||
/// Returns an error if decryption fails or the packet is not addressed to the final hop.
|
||||
pub(crate) fn recover_test_payload(
|
||||
&self,
|
||||
received: SphinxPacket,
|
||||
) -> anyhow::Result<TestPacketContent> {
|
||||
match self {
|
||||
PayloadRecovery::ReusableHeader(header) => header.recover_payload(received.payload),
|
||||
PayloadRecovery::FullProcessing(private_key) => {
|
||||
let ProcessedPacketData::FinalHop { payload, .. } =
|
||||
received.process(private_key.private_key().inner())?.data
|
||||
else {
|
||||
bail!("received non final hop data")
|
||||
};
|
||||
TestPacketContent::from_bytes(&payload.recover_plaintext()?)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Receives raw sphinx packets forwarded by the [`MixnetListener`](crate::listener::MixnetListener),
|
||||
/// decrypts them, and exposes them as [`ProcessedPacket`]s with RTT measurements.
|
||||
///
|
||||
/// The processor owns one half of an unbounded channel; the sender half is cloned and handed
|
||||
/// to the listener via [`sender`](Self::sender). Packets can be consumed one at a time with
|
||||
/// [`next_packet`](Self::next_packet) or drained in bulk with [`all_available`](Self::all_available).
|
||||
pub(crate) struct MixnetPacketProcessor {
|
||||
/// Decryption strategy: either reuse a pre-built header or perform full sphinx processing.
|
||||
payload_recovery: PayloadRecovery,
|
||||
|
||||
/// How long [`next_packet`](Self::next_packet) will wait before returning a timeout error.
|
||||
receive_timeout: Duration,
|
||||
|
||||
/// Sender half kept alive so the channel stays open as long as the processor exists.
|
||||
sender: MixnetPacketsSender,
|
||||
|
||||
/// Receiver half polled by [`next_packet`](Self::next_packet) and [`all_available`](Self::all_available).
|
||||
receiver: MixnetPacketsReceiver,
|
||||
}
|
||||
|
||||
impl MixnetPacketProcessor {
|
||||
/// Creates a new processor along with an internal channel for receiving packets.
|
||||
pub(crate) fn new(payload_recovery: PayloadRecovery, receive_timeout: Duration) -> Self {
|
||||
let (sender, receiver) = unbounded();
|
||||
|
||||
Self {
|
||||
payload_recovery,
|
||||
receive_timeout,
|
||||
sender,
|
||||
receiver,
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns a clone of the sender half so the listener can forward packets to this processor.
|
||||
pub(crate) fn sender(&self) -> MixnetPacketsSender {
|
||||
self.sender.clone()
|
||||
}
|
||||
|
||||
/// Decrypts a [`ReceivedPacket`] and computes its RTT from the embedded send timestamp.
|
||||
fn process_received(&self, packet: ReceivedPacket) -> anyhow::Result<ProcessedPacket> {
|
||||
let sphinx_packet = packet
|
||||
.received
|
||||
.into_inner()
|
||||
.to_sphinx_packet()
|
||||
.context("the received packet was not a sphinx packet!")?;
|
||||
let received_content = self.payload_recovery.recover_test_payload(sphinx_packet)?;
|
||||
let latency = packet.received_at - received_content.sending_timestamp;
|
||||
|
||||
Ok(ProcessedPacket {
|
||||
id: received_content.id,
|
||||
rtt: latency.unsigned_abs(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Drains all packets currently available in the channel without blocking.
|
||||
/// Returns a vec of results — decryption failures are included as `Err` entries rather
|
||||
/// than causing the entire drain to abort.
|
||||
pub(crate) fn all_available(&mut self) -> Vec<anyhow::Result<ProcessedPacket>> {
|
||||
let mut packets = Vec::new();
|
||||
while let Ok(Some(pending)) = self.receiver.try_next() {
|
||||
packets.push(self.process_received(pending));
|
||||
}
|
||||
|
||||
debug!("drained {} immediately available packets", packets.len());
|
||||
packets
|
||||
}
|
||||
|
||||
/// Waits for the next packet, up to `receive_timeout`.
|
||||
/// Returns `Err` on timeout, channel exhaustion, or decryption failure.
|
||||
pub(crate) async fn next_packet(&mut self) -> anyhow::Result<ProcessedPacket> {
|
||||
let packet = timeout(self.receive_timeout, self.receiver.next())
|
||||
.await
|
||||
.inspect_err(|_| {
|
||||
warn!(
|
||||
"timed out waiting for next packet after {}",
|
||||
humantime::format_duration(self.receive_timeout)
|
||||
)
|
||||
})?
|
||||
.context("stream has been exhausted")?;
|
||||
|
||||
self.process_received(packet)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,264 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::test_packet::TestPacketHeader;
|
||||
use arrayref::array_ref;
|
||||
use hkdf::Hkdf;
|
||||
use nym_crypto::aes::cipher::crypto_common::rand_core::OsRng;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_sphinx_addressing::nodes::NymNodeRoutingAddress;
|
||||
use nym_sphinx_params::PacketSize;
|
||||
use nym_sphinx_types::constants::{
|
||||
BLINDING_FACTOR_SIZE, EXPANDED_SHARED_SECRET_HKDF_INFO, EXPANDED_SHARED_SECRET_HKDF_SALT,
|
||||
EXPANDED_SHARED_SECRET_LENGTH, INTEGRITY_MAC_KEY_SIZE, PAYLOAD_KEY_SEED_SIZE,
|
||||
};
|
||||
use nym_sphinx_types::crypto::STREAM_CIPHER_KEY_SIZE;
|
||||
use nym_sphinx_types::{
|
||||
DESTINATION_ADDRESS_LENGTH, Delay, Destination, DestinationAddressBytes, IDENTIFIER_LENGTH,
|
||||
Node, PAYLOAD_KEY_SIZE, PayloadKey, SphinxPacket, SphinxPacketBuilder, derive_payload_key,
|
||||
};
|
||||
use sha2::Sha256;
|
||||
use std::net::SocketAddr;
|
||||
use std::time::Duration;
|
||||
use x25519_dalek::{PublicKey, StaticSecret};
|
||||
|
||||
/// Newtype wrapper around the HKDF-expanded shared secret used in the sphinx protocol
|
||||
/// since the actual type within the sphinx library does not expose the required methods.
|
||||
pub(crate) struct ExpandedSharedSecretWrapper(pub(crate) [u8; EXPANDED_SHARED_SECRET_LENGTH]);
|
||||
|
||||
impl ExpandedSharedSecretWrapper {
|
||||
/// Returns the blinding factor as an x25519 [`StaticSecret`], used to derive the
|
||||
/// shared secret for the next hop when manually reconstructing payload keys.
|
||||
pub(crate) fn blinding_factor(&self) -> StaticSecret {
|
||||
StaticSecret::from(*self.blinding_factor_bytes())
|
||||
}
|
||||
|
||||
/// Returns the raw blinding factor bytes.
|
||||
pub(crate) fn blinding_factor_bytes(&self) -> &[u8; BLINDING_FACTOR_SIZE] {
|
||||
array_ref!(
|
||||
&self.0,
|
||||
STREAM_CIPHER_KEY_SIZE + INTEGRITY_MAC_KEY_SIZE + PAYLOAD_KEY_SIZE,
|
||||
BLINDING_FACTOR_SIZE
|
||||
)
|
||||
}
|
||||
|
||||
/// Returns the payload key seed, used as input to [`derive_payload_key`].
|
||||
pub(crate) fn payload_key_seed(&self) -> &[u8; PAYLOAD_KEY_SEED_SIZE] {
|
||||
array_ref!(
|
||||
&self.0,
|
||||
STREAM_CIPHER_KEY_SIZE + INTEGRITY_MAC_KEY_SIZE,
|
||||
PAYLOAD_KEY_SEED_SIZE
|
||||
)
|
||||
}
|
||||
|
||||
/// Derives the [`PayloadKey`] for this hop from the payload key seed.
|
||||
pub(crate) fn derive_payload_key(&self) -> PayloadKey {
|
||||
derive_payload_key(self.payload_key_seed())
|
||||
}
|
||||
}
|
||||
|
||||
/// Re-derives the expanded shared secret from a raw 32-byte DH shared secret using HKDF-SHA256
|
||||
/// with the sphinx protocol's standard salt and info strings.
|
||||
///
|
||||
/// This mirrors the derivation performed inside the sphinx library, which is not publicly
|
||||
/// exposed — hence the need to replicate it here when reconstructing payload keys for a
|
||||
/// reusable header.
|
||||
pub(crate) fn rederive_expanded_shared_secret(
|
||||
shared_secret: &[u8; 32],
|
||||
) -> ExpandedSharedSecretWrapper {
|
||||
let hkdf = Hkdf::<Sha256>::new(Some(EXPANDED_SHARED_SECRET_HKDF_SALT), shared_secret);
|
||||
|
||||
let mut output = [0u8; EXPANDED_SHARED_SECRET_LENGTH];
|
||||
// SAFETY: the length of the provided okm is within the allowed range
|
||||
#[allow(clippy::unwrap_used)]
|
||||
hkdf.expand(EXPANDED_SHARED_SECRET_HKDF_INFO, &mut output)
|
||||
.unwrap();
|
||||
|
||||
ExpandedSharedSecretWrapper(output)
|
||||
}
|
||||
|
||||
/// Returns an all-zeroes [`Destination`] used as a placeholder for the final delivery address.
|
||||
/// The sphinx protocol requires a destination, but for the agent's loopback packets the
|
||||
/// address is irrelevant — the final hop (the agent itself) is already in the route.
|
||||
fn dummy_destination() -> Destination {
|
||||
Destination::new(
|
||||
DestinationAddressBytes::from_bytes([0u8; DESTINATION_ADDRESS_LENGTH]),
|
||||
[0u8; IDENTIFIER_LENGTH],
|
||||
)
|
||||
}
|
||||
|
||||
/// Builds a single test sphinx packet along `route` with the given per-hop `delay`.
|
||||
///
|
||||
/// The packet uses [`PacketSize::AckPacket`] to keep its size as small as possible. If `initial_secret`
|
||||
/// is provided it is used as the sender's ephemeral key, allowing the resulting header to
|
||||
/// be deterministically reproduced (needed for `create_test_sphinx_packet_header`).
|
||||
pub(crate) fn build_test_sphinx_packet(
|
||||
route: &[Node; 2],
|
||||
delay: Duration,
|
||||
initial_secret: Option<&StaticSecret>,
|
||||
message: &[u8],
|
||||
) -> anyhow::Result<SphinxPacket> {
|
||||
let delays = [
|
||||
Delay::new_from_nanos(delay.as_nanos() as u64),
|
||||
Delay::new_from_nanos(delay.as_nanos() as u64),
|
||||
];
|
||||
let destination = dummy_destination();
|
||||
let payload = PacketSize::AckPacket.payload_size();
|
||||
|
||||
let packet = match initial_secret {
|
||||
None => SphinxPacketBuilder::new()
|
||||
.with_payload_size(payload)
|
||||
.build_packet(message, route, &destination, &delays),
|
||||
Some(initial_secret) => SphinxPacketBuilder::new()
|
||||
.with_payload_size(payload)
|
||||
.with_initial_secret(initial_secret)
|
||||
.build_packet(message, route, &destination, &delays),
|
||||
}?;
|
||||
|
||||
Ok(packet)
|
||||
}
|
||||
|
||||
/// Builds a [`TestPacketHeader`] that can be reused to send many packets with different
|
||||
/// payloads but the same routing header.
|
||||
///
|
||||
/// Internally this builds one full sphinx packet to capture the header, then manually
|
||||
/// re-derives the per-hop payload keys by replaying the DH key-agreement steps along the
|
||||
/// route. This is necessary because the sphinx library does not expose the payload keys
|
||||
/// after packet construction.
|
||||
///
|
||||
/// The derived `payload_key` vec has one entry per hop; the last entry (index 1) is the
|
||||
/// key held by this agent as the final recipient and is used by [`TestPacketHeader::recover_payload`].
|
||||
pub(crate) fn create_test_sphinx_packet_header(
|
||||
route: [Node; 2],
|
||||
delay: Duration,
|
||||
) -> anyhow::Result<TestPacketHeader> {
|
||||
let initial_secret = StaticSecret::random_from_rng(OsRng);
|
||||
|
||||
// Build a throwaway packet solely to capture the reusable header.
|
||||
let packet = build_test_sphinx_packet(&route, delay, Some(&initial_secret), b"dummy-message")?;
|
||||
|
||||
let header = packet.header;
|
||||
|
||||
// Manually reconstruct the payload keys for each hop.
|
||||
let mut expanded_shared_secrets = Vec::new();
|
||||
let mut blinding_factors = Vec::new();
|
||||
|
||||
// The sphinx library keeps these private, so we replicate the derivation:
|
||||
// for each hop, apply all previous blinding factors to the node's public key
|
||||
// via DH, then expand the result with HKDF to obtain the payload key.
|
||||
for node in &route {
|
||||
let mut acc = node.pub_key;
|
||||
|
||||
for blinding_factor in std::iter::once(&initial_secret).chain(&blinding_factors) {
|
||||
let shared_secret = blinding_factor.diffie_hellman(&acc);
|
||||
acc = PublicKey::from(shared_secret.to_bytes());
|
||||
}
|
||||
|
||||
let expanded_shared_secret = rederive_expanded_shared_secret(acc.as_bytes());
|
||||
blinding_factors.push(expanded_shared_secret.blinding_factor());
|
||||
expanded_shared_secrets.push(expanded_shared_secret);
|
||||
}
|
||||
|
||||
let payload_keys = expanded_shared_secrets
|
||||
.iter()
|
||||
.map(|s| s.derive_payload_key())
|
||||
.collect::<Vec<_>>();
|
||||
assert_eq!(payload_keys.len(), 2);
|
||||
|
||||
Ok(TestPacketHeader {
|
||||
header,
|
||||
payload_key: payload_keys,
|
||||
})
|
||||
}
|
||||
|
||||
/// Constructs a sphinx [`Node`](Node) from a socket address and public key.
|
||||
/// Panics if the address cannot be converted to a routing address, which should never happen
|
||||
/// for a valid `SocketAddr`.
|
||||
pub(crate) fn as_sphinx_node(address: SocketAddr, pub_key: x25519::PublicKey) -> Node {
|
||||
// SAFETY: we know that the address is valid, so we can safely unwrap it
|
||||
#[allow(clippy::unwrap_used)]
|
||||
Node::new(
|
||||
NymNodeRoutingAddress::from(address).try_into().unwrap(),
|
||||
pub_key.into(),
|
||||
)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::test_packet::TestPacketContent;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_sphinx_addressing::nodes::NymNodeRoutingAddress;
|
||||
use nym_sphinx_types::ProcessedPacketData;
|
||||
use nym_test_utils::helpers::deterministic_rng;
|
||||
use std::net::SocketAddr;
|
||||
|
||||
#[test]
|
||||
fn creating_test_sphinx_packets() {
|
||||
let mut rng = deterministic_rng();
|
||||
let remote_node_key = x25519::KeyPair::new(&mut rng);
|
||||
let agent_key = x25519::KeyPair::new(&mut rng);
|
||||
let node_addr: SocketAddr = "1.2.3.4:5677".parse().unwrap();
|
||||
let agent_addr: SocketAddr = "2.2.3.4:5678".parse().unwrap();
|
||||
|
||||
let remote_node = Node::new(
|
||||
NymNodeRoutingAddress::from(node_addr).try_into().unwrap(),
|
||||
(*remote_node_key.public_key()).into(),
|
||||
);
|
||||
let agent_node = Node::new(
|
||||
NymNodeRoutingAddress::from(agent_addr).try_into().unwrap(),
|
||||
(*agent_key.public_key()).into(),
|
||||
);
|
||||
|
||||
let delay = Duration::from_millis(1);
|
||||
|
||||
let test_header =
|
||||
create_test_sphinx_packet_header([remote_node, agent_node], delay).unwrap();
|
||||
|
||||
let payload1 = TestPacketContent::new(123);
|
||||
let payload2 = TestPacketContent::new(456);
|
||||
|
||||
let packet1 = test_header.create_test_packet(payload1).unwrap();
|
||||
let packet2 = test_header.create_test_packet(payload2).unwrap();
|
||||
|
||||
// simulate packet being received by remote node
|
||||
let res1 = packet1
|
||||
.process(remote_node_key.private_key().inner())
|
||||
.unwrap();
|
||||
let ProcessedPacketData::ForwardHop {
|
||||
next_hop_packet: res1_packet,
|
||||
next_hop_address,
|
||||
..
|
||||
} = res1.data
|
||||
else {
|
||||
panic!("bad data")
|
||||
};
|
||||
assert_eq!(
|
||||
next_hop_address,
|
||||
NymNodeRoutingAddress::from(agent_addr).try_into().unwrap()
|
||||
);
|
||||
|
||||
let res2 = packet2
|
||||
.process(remote_node_key.private_key().inner())
|
||||
.unwrap();
|
||||
let ProcessedPacketData::ForwardHop {
|
||||
next_hop_packet: res2_packet,
|
||||
next_hop_address,
|
||||
..
|
||||
} = res2.data
|
||||
else {
|
||||
panic!("bad data")
|
||||
};
|
||||
assert_eq!(
|
||||
next_hop_address,
|
||||
NymNodeRoutingAddress::from(agent_addr).try_into().unwrap()
|
||||
);
|
||||
|
||||
// now getting back to us (no need for full unwrapping as we already have the payload key)
|
||||
let received1 = test_header.recover_payload(res1_packet.payload).unwrap();
|
||||
assert_eq!(received1, payload1);
|
||||
|
||||
let received2 = test_header.recover_payload(res2_packet.payload).unwrap();
|
||||
assert_eq!(received2, payload2);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,204 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use anyhow::{Context, bail};
|
||||
use nym_sphinx_params::PacketSize;
|
||||
use nym_sphinx_types::{Payload, PayloadKey, SphinxHeader, SphinxPacket};
|
||||
use time::OffsetDateTime;
|
||||
|
||||
/// A pre-built sphinx packet header that can be reused across multiple test packets.
|
||||
///
|
||||
/// When `config.reuse_header` is enabled the agent constructs one header for the entire
|
||||
/// test run and stamps a fresh [`TestPacketContent`] (new ID + timestamp) into each
|
||||
/// packet's payload. This lets the agent avoid performing expensive packet derivation
|
||||
/// for each sent payload.
|
||||
pub(crate) struct TestPacketHeader {
|
||||
/// The immutable sphinx routing header shared across all replayed packets.
|
||||
pub(crate) header: SphinxHeader,
|
||||
|
||||
/// List of payload keys derived when the header was built
|
||||
pub(crate) payload_key: Vec<PayloadKey>,
|
||||
}
|
||||
|
||||
impl Clone for TestPacketHeader {
|
||||
fn clone(&self) -> Self {
|
||||
TestPacketHeader {
|
||||
header: SphinxHeader {
|
||||
shared_secret: self.header.shared_secret,
|
||||
routing_info: self.header.routing_info.clone(),
|
||||
},
|
||||
payload_key: self.payload_key.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl TestPacketHeader {
|
||||
/// Encapsulates `content` into a new [`SphinxPacket`] by reusing the pre-built header.
|
||||
pub(crate) fn create_test_packet(
|
||||
&self,
|
||||
content: TestPacketContent,
|
||||
) -> anyhow::Result<SphinxPacket> {
|
||||
let payload = Payload::encapsulate_message(
|
||||
&content.to_bytes(),
|
||||
&self.payload_key,
|
||||
PacketSize::AckPacket.payload_size(),
|
||||
)?;
|
||||
Ok(SphinxPacket {
|
||||
header: SphinxHeader {
|
||||
shared_secret: self.header.shared_secret,
|
||||
routing_info: self.header.routing_info.clone(),
|
||||
},
|
||||
payload,
|
||||
})
|
||||
}
|
||||
|
||||
/// Decrypts a received payload using the last payload key (the one belonging to this
|
||||
/// agent as the final hop) and deserialises it into a [`TestPacketContent`].
|
||||
pub(crate) fn recover_payload(&self, received: Payload) -> anyhow::Result<TestPacketContent> {
|
||||
let key = self
|
||||
.payload_key
|
||||
.last()
|
||||
.context("no payload keys generated")?;
|
||||
|
||||
let payload = received.unwrap(key)?.recover_plaintext()?;
|
||||
TestPacketContent::from_bytes(&payload)
|
||||
}
|
||||
}
|
||||
|
||||
/// The payload embedded in every test sphinx packet.
|
||||
///
|
||||
/// Serialises to exactly 16 bytes: 8 bytes for `id` (big-endian `u64`) followed by
|
||||
/// 8 bytes for `sending_timestamp` (big-endian Unix timestamp in nanoseconds as `i64`).
|
||||
/// Nanosecond precision is preserved for dates up to year 2262 (i64 max ≈ 9.2*10^18 ns).
|
||||
/// The timestamp is used to compute the packet's round-trip time on receipt.
|
||||
#[derive(Copy, Clone, PartialEq, Debug)]
|
||||
pub(crate) struct TestPacketContent {
|
||||
/// Monotonically increasing ID assigned by the agent; used to detect duplicates and
|
||||
/// correlate sent packets with received ones.
|
||||
pub(crate) id: u64,
|
||||
|
||||
/// UTC wall-clock time at which the packet was created, used to compute RTT.
|
||||
pub(crate) sending_timestamp: OffsetDateTime,
|
||||
}
|
||||
|
||||
impl TestPacketContent {
|
||||
/// Creates a new content value with the given `id` and the current UTC time.
|
||||
pub(crate) fn new(id: u64) -> Self {
|
||||
Self {
|
||||
id,
|
||||
sending_timestamp: OffsetDateTime::now_utc(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Serialises the content to 16 bytes: `id` as big-endian u64, then
|
||||
/// `sending_timestamp` as a big-endian i64 Unix timestamp in nanoseconds.
|
||||
pub(crate) fn to_bytes(self) -> Vec<u8> {
|
||||
let mut bytes = Vec::with_capacity(16);
|
||||
bytes.extend_from_slice(&self.id.to_be_bytes());
|
||||
// unix_timestamp_nanos() returns i128, but the value fits in i64 for dates up to year 2262.
|
||||
#[allow(clippy::cast_possible_truncation)]
|
||||
bytes.extend_from_slice(
|
||||
&(self.sending_timestamp.unix_timestamp_nanos() as i64).to_be_bytes(),
|
||||
);
|
||||
bytes
|
||||
}
|
||||
|
||||
/// Deserialises content from a 16-byte slice produced by [`to_bytes`](Self::to_bytes).
|
||||
/// Returns an error if the slice is not exactly 16 bytes or the timestamp is out of range.
|
||||
pub(crate) fn from_bytes(bytes: &[u8]) -> anyhow::Result<Self> {
|
||||
if bytes.len() != 16 {
|
||||
bail!("malformed test packet received")
|
||||
}
|
||||
|
||||
let id = u64::from_be_bytes(bytes[0..8].try_into()?);
|
||||
let nanos = i64::from_be_bytes(bytes[8..16].try_into()?);
|
||||
Ok(Self {
|
||||
id,
|
||||
sending_timestamp: OffsetDateTime::from_unix_timestamp_nanos(nanos as i128)?,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use time::macros::datetime;
|
||||
|
||||
fn content_with_timestamp(id: u64, ts: OffsetDateTime) -> TestPacketContent {
|
||||
TestPacketContent {
|
||||
id,
|
||||
sending_timestamp: ts,
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn serialised_length_is_always_16_bytes() {
|
||||
let content = TestPacketContent::new(0);
|
||||
assert_eq!(content.to_bytes().len(), 16);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn roundtrip_preserves_all_fields() {
|
||||
// Use a fixed timestamp to avoid sub-nanosecond clock jitter in the test.
|
||||
let original = content_with_timestamp(42, datetime!(2025-06-01 12:00:00 UTC));
|
||||
let bytes = original.to_bytes();
|
||||
let recovered = TestPacketContent::from_bytes(&bytes).unwrap();
|
||||
assert_eq!(original, recovered);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn roundtrip_preserves_nanosecond_precision() {
|
||||
// Construct a timestamp with a sub-second component to verify nanos are not truncated.
|
||||
let ts = datetime!(2025-06-01 12:00:00.123456789 UTC);
|
||||
let original = content_with_timestamp(1, ts);
|
||||
let recovered = TestPacketContent::from_bytes(&original.to_bytes()).unwrap();
|
||||
assert_eq!(
|
||||
original.sending_timestamp.unix_timestamp_nanos(),
|
||||
recovered.sending_timestamp.unix_timestamp_nanos()
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn id_zero_and_max_roundtrip() {
|
||||
for id in [0u64, u64::MAX] {
|
||||
let original = content_with_timestamp(id, datetime!(2025-01-01 00:00:00 UTC));
|
||||
let recovered = TestPacketContent::from_bytes(&original.to_bytes()).unwrap();
|
||||
assert_eq!(recovered.id, id);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn id_is_encoded_in_first_8_bytes_big_endian() {
|
||||
let content = content_with_timestamp(1, datetime!(2025-01-01 00:00:00 UTC));
|
||||
let bytes = content.to_bytes();
|
||||
let id_bytes: [u8; 8] = bytes[0..8].try_into().unwrap();
|
||||
assert_eq!(u64::from_be_bytes(id_bytes), 1u64);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn timestamp_is_encoded_in_last_8_bytes_big_endian() {
|
||||
let ts = datetime!(2025-01-01 00:00:00 UTC);
|
||||
let content = content_with_timestamp(0, ts);
|
||||
let bytes = content.to_bytes();
|
||||
let ts_bytes: [u8; 8] = bytes[8..16].try_into().unwrap();
|
||||
assert_eq!(
|
||||
i64::from_be_bytes(ts_bytes),
|
||||
ts.unix_timestamp_nanos() as i64
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn from_bytes_rejects_too_short() {
|
||||
assert!(TestPacketContent::from_bytes(&[0u8; 15]).is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn from_bytes_rejects_too_long() {
|
||||
assert!(TestPacketContent::from_bytes(&[0u8; 17]).is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn from_bytes_rejects_empty() {
|
||||
assert!(TestPacketContent::from_bytes(&[]).is_err());
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,31 @@
|
||||
[package]
|
||||
name = "nym-network-monitor-orchestrator-requests"
|
||||
authors.workspace = true
|
||||
repository.workspace = true
|
||||
homepage.workspace = true
|
||||
documentation.workspace = true
|
||||
edition.workspace = true
|
||||
license.workspace = true
|
||||
rust-version.workspace = true
|
||||
readme.workspace = true
|
||||
version.workspace = true
|
||||
publish = false
|
||||
|
||||
[dependencies]
|
||||
anyhow = { workspace = true }
|
||||
humantime-serde = { workspace = true }
|
||||
serde = { workspace = true, features = ["derive"] }
|
||||
time = { workspace = true, features = ["serde-well-known"] }
|
||||
tracing = { workspace = true }
|
||||
utoipa = { workspace = true, optional = true }
|
||||
zeroize = { workspace = true, optional = true }
|
||||
|
||||
nym-crypto = { workspace = true, features = ["asymmetric", "serde"] }
|
||||
nym-http-api-client = { workspace = true, optional = true }
|
||||
|
||||
[features]
|
||||
client = ["nym-http-api-client", "zeroize"]
|
||||
openapi = ["utoipa"]
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
@@ -0,0 +1,2 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
@@ -0,0 +1,81 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::models::{
|
||||
AgentAnnounceRequest, AgentAnnounceResponse, TestRunAssignmentRequest,
|
||||
TestRunAssignmentResponse, TestRunResultSubmissionRequest, TestRunSubmissionResponse,
|
||||
};
|
||||
use crate::routes::v1::agent::{
|
||||
announce_absolute, request_testrun_absolute, submit_testrun_absolute,
|
||||
};
|
||||
pub use nym_http_api_client::Client;
|
||||
use nym_http_api_client::{ApiClient, HttpClientError, NO_PARAMS, Url, parse_response};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::sync::Arc;
|
||||
use zeroize::Zeroizing;
|
||||
|
||||
/// HTTP client for communicating with the network monitor orchestrator API.
|
||||
/// All requests are authenticated with a bearer token.
|
||||
pub struct OrchestratorClient {
|
||||
inner: Client,
|
||||
bearer_token: Arc<Zeroizing<String>>,
|
||||
}
|
||||
|
||||
impl OrchestratorClient {
|
||||
/// Creates a new client targeting `base_url`, storing the bearer token in a
|
||||
/// zeroizing container.
|
||||
pub fn new(base_url: Url, bearer_token: String) -> Result<Self, HttpClientError> {
|
||||
Ok(OrchestratorClient {
|
||||
inner: Client::builder(base_url)?
|
||||
.no_hickory_dns()
|
||||
.with_user_agent(format!(
|
||||
"nym-network-monitor-orchestrator-requests/{}",
|
||||
env!("CARGO_PKG_VERSION")
|
||||
))
|
||||
.build()?,
|
||||
bearer_token: Arc::new(Zeroizing::new(bearer_token)),
|
||||
})
|
||||
}
|
||||
|
||||
/// Sends an authenticated POST request with a JSON body and deserialises the response.
|
||||
async fn post_with_auth<B, T>(&self, path: &str, json_body: &B) -> Result<T, HttpClientError>
|
||||
where
|
||||
B: Serialize + ?Sized + Sync,
|
||||
for<'a> T: Deserialize<'a>,
|
||||
{
|
||||
let res = self
|
||||
.inner
|
||||
.create_post_request(path, NO_PARAMS, json_body)?
|
||||
.bearer_auth(self.bearer_token.as_str())
|
||||
.send()
|
||||
.await?;
|
||||
|
||||
parse_response(res, false).await
|
||||
}
|
||||
|
||||
/// Announces this agent's details to the orchestrator, which forwards them
|
||||
/// to the smart contract so network nodes can whitelist the agent.
|
||||
pub async fn announce_agent(
|
||||
&self,
|
||||
body: &AgentAnnounceRequest,
|
||||
) -> Result<AgentAnnounceResponse, HttpClientError> {
|
||||
self.post_with_auth(&announce_absolute(), body).await
|
||||
}
|
||||
|
||||
/// Asks the orchestrator for the next test run to execute. Returns `None`
|
||||
/// inside the assignment if no work is currently available.
|
||||
pub async fn request_work_assignment(
|
||||
&self,
|
||||
body: &TestRunAssignmentRequest,
|
||||
) -> Result<TestRunAssignmentResponse, HttpClientError> {
|
||||
self.post_with_auth(&request_testrun_absolute(), body).await
|
||||
}
|
||||
|
||||
/// Submits the result of a completed test run back to the orchestrator for storage.
|
||||
pub async fn submit_test_run_result(
|
||||
&self,
|
||||
body: &TestRunResultSubmissionRequest,
|
||||
) -> Result<TestRunSubmissionResponse, HttpClientError> {
|
||||
self.post_with_auth(&submit_testrun_absolute(), body).await
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,91 @@
|
||||
// Copyright 2026 Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
pub mod api;
|
||||
pub mod models;
|
||||
|
||||
#[cfg(feature = "client")]
|
||||
pub mod client;
|
||||
|
||||
/// Generates a function that returns the full absolute path for a route
|
||||
/// by concatenating a parent prefix with a suffix.
|
||||
macro_rules! absolute_route {
|
||||
( $name:ident, $parent:expr, $suffix:expr ) => {
|
||||
pub fn $name() -> String {
|
||||
format!("{}{}", $parent, $suffix)
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
/// Route constants and absolute-path helpers for the orchestrator HTTP API.
|
||||
/// Used by both the orchestrator server (for route registration) and the agent
|
||||
/// client (for constructing request URLs).
|
||||
pub mod routes {
|
||||
pub const ROOT: &str = "/";
|
||||
pub const V1: &str = "/v1";
|
||||
pub const SWAGGER: &str = "/swagger";
|
||||
|
||||
pub mod v1 {
|
||||
pub const AGENT: &str = "/agent";
|
||||
pub const METRICS: &str = "/metrics";
|
||||
pub const RESULTS: &str = "/results";
|
||||
|
||||
absolute_route!(agent_absolute, super::V1, AGENT);
|
||||
absolute_route!(metrics_absolute, super::V1, METRICS);
|
||||
absolute_route!(results_absolute, super::V1, RESULTS);
|
||||
|
||||
pub mod agent {
|
||||
use super::*;
|
||||
|
||||
pub const ANNOUNCE: &str = "/announce";
|
||||
pub const REQUEST_TESTRUN: &str = "/request-testrun";
|
||||
pub const SUBMIT_TESTRUN_RESULT: &str = "/submit-testrun-result";
|
||||
|
||||
absolute_route!(announce_absolute, agent_absolute(), ANNOUNCE);
|
||||
absolute_route!(request_testrun_absolute, agent_absolute(), REQUEST_TESTRUN);
|
||||
absolute_route!(
|
||||
submit_testrun_absolute,
|
||||
agent_absolute(),
|
||||
SUBMIT_TESTRUN_RESULT
|
||||
);
|
||||
}
|
||||
|
||||
pub mod metrics {
|
||||
use super::*;
|
||||
|
||||
pub const PROMETHEUS: &str = "/prometheus";
|
||||
|
||||
absolute_route!(prometheus_absolute, metrics_absolute(), PROMETHEUS);
|
||||
}
|
||||
|
||||
pub mod results {
|
||||
use super::*;
|
||||
|
||||
pub const TESTRUN_BY_ID: &str = "/testrun/:id";
|
||||
pub const NYM_NODE_BY_NODE_ID: &str = "/nym-node/:node_id";
|
||||
pub const NYM_NODE_TESTRUNS: &str = "/nym-node/:node_id/testruns";
|
||||
pub const TESTRUNS_IN_PROGRESS: &str = "/testruns-in-progress";
|
||||
pub const TESTRUNS: &str = "/testruns";
|
||||
pub const NYM_NODES: &str = "/nym-nodes";
|
||||
|
||||
absolute_route!(testrun_by_id_absolute, results_absolute(), TESTRUN_BY_ID);
|
||||
absolute_route!(
|
||||
nym_node_by_node_id_absolute,
|
||||
results_absolute(),
|
||||
NYM_NODE_BY_NODE_ID
|
||||
);
|
||||
absolute_route!(
|
||||
nym_node_testruns_absolute,
|
||||
results_absolute(),
|
||||
NYM_NODE_TESTRUNS
|
||||
);
|
||||
absolute_route!(
|
||||
testruns_in_progress_absolute,
|
||||
results_absolute(),
|
||||
TESTRUNS_IN_PROGRESS
|
||||
);
|
||||
absolute_route!(testruns_absolute, results_absolute(), TESTRUNS);
|
||||
absolute_route!(nym_nodes_absolute, results_absolute(), NYM_NODES);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,378 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use nym_crypto::asymmetric::ed25519;
|
||||
use nym_crypto::asymmetric::ed25519::serde_helpers::bs58_ed25519_pubkey;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_crypto::asymmetric::x25519::serde_helpers::{
|
||||
bs58_x25519_pubkey, option_bs58_x25519_pubkey,
|
||||
};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::net::SocketAddr;
|
||||
use std::time::Duration;
|
||||
use time::OffsetDateTime;
|
||||
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
/// Body sent by an agent to announce its details to the orchestrator.
|
||||
/// The orchestrator forwards this information to the smart contract so that
|
||||
/// network nodes can whitelist connections from known agents.
|
||||
pub struct AgentAnnounceRequest {
|
||||
/// Egress address of the agent node combined with the previously
|
||||
/// assigned mixnet socket address from the orchestrator
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub agent_mix_socket_address: SocketAddr,
|
||||
|
||||
/// Base-58 encoded noise key of the agent.
|
||||
#[serde(with = "bs58_x25519_pubkey")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub x25519_noise_key: x25519::PublicKey,
|
||||
|
||||
/// Version of the noise protocol used by the agent.
|
||||
pub noise_version: u8,
|
||||
}
|
||||
|
||||
/// Confirmation returned to an agent after a successful announcement.
|
||||
/// Currently empty — exists to give the response an explicit type rather than
|
||||
/// relying on `Json(())`.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct AgentAnnounceResponse {}
|
||||
|
||||
/// Request sent by an agent to ask the orchestrator for a node to test.
|
||||
/// Identifies the agent so the orchestrator can verify it has been announced.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TestRunAssignmentRequest {
|
||||
/// Egress address of the agent node combined with the previously
|
||||
/// assigned mixnet socket address from the orchestrator
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub agent_mix_socket_address: SocketAddr,
|
||||
|
||||
/// Base-58 encoded noise key of the agent.
|
||||
#[serde(with = "bs58_x25519_pubkey")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub x25519_noise_key: x25519::PublicKey,
|
||||
}
|
||||
|
||||
/// Response from the orchestrator when an agent requests work.
|
||||
/// `assignment` is `None` when no nodes are due for testing.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TestRunAssignmentResponse {
|
||||
pub assignment: Option<TestRunAssignment>,
|
||||
}
|
||||
|
||||
/// Details of a single node assigned to an agent for stress testing.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TestRunAssignment {
|
||||
pub node_id: u32,
|
||||
|
||||
/// The address of the node that should be tested.
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub node_address: SocketAddr,
|
||||
|
||||
#[serde(with = "bs58_x25519_pubkey")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub noise_key: x25519::PublicKey,
|
||||
|
||||
#[serde(with = "bs58_x25519_pubkey")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub sphinx_key: x25519::PublicKey,
|
||||
|
||||
pub key_rotation_id: u32,
|
||||
}
|
||||
|
||||
/// Latency statistics computed over the set of test packets received or sent during a stress test.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Copy, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct LatencyDistribution {
|
||||
/// Minimum latency duration it took to send or receive a test packet.
|
||||
#[serde(with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub minimum: Duration,
|
||||
|
||||
/// Average latency duration it took to send or receive a test packet.
|
||||
#[serde(with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub mean: Duration,
|
||||
|
||||
/// Median latency duration it took to send or receive a test packet.
|
||||
/// For an even number of samples, this is the arithmetic mean of the two middle values.
|
||||
#[serde(with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub median: Duration,
|
||||
|
||||
/// Maximum latency duration it took to send or receive a test packet.
|
||||
#[serde(with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub maximum: Duration,
|
||||
|
||||
/// The standard deviation of the latency duration it took to send or receive the test packets.
|
||||
#[serde(with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub standard_deviation: Duration,
|
||||
}
|
||||
|
||||
/// Request sent by an agent to submit test results for a previously assigned node.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
|
||||
pub struct TestRunResultSubmissionRequest {
|
||||
pub node_id: u32,
|
||||
pub result: TestRunResult,
|
||||
}
|
||||
|
||||
/// Captures the outcome of a single test run against a nym node.
|
||||
///
|
||||
/// Fields are populated incrementally as the test progresses; absent values (`None`) indicate
|
||||
/// that the corresponding step was not reached or did not produce a result.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
|
||||
pub struct TestRunResult {
|
||||
/// Total duration of the test run, including the time it took to establish the connections.
|
||||
#[serde(default, with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = Option<String>))]
|
||||
pub time_taken: Duration,
|
||||
|
||||
/// Duration of the Noise handshake on the ingress (responder) side, if completed.
|
||||
#[serde(default, with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = Option<String>))]
|
||||
pub ingress_noise_handshake: Option<Duration>,
|
||||
|
||||
/// Duration of the Noise handshake on the egress (initiator) side, if completed.
|
||||
#[serde(default, with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = Option<String>))]
|
||||
pub egress_noise_handshake: Option<Duration>,
|
||||
|
||||
/// The (constant) delay of the sphinx packet set during the test run.
|
||||
#[serde(default, with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub sphinx_packet_delay: Duration,
|
||||
|
||||
/// Number of sphinx packets successfully sent to the node under test.
|
||||
pub packets_sent: usize,
|
||||
|
||||
/// Number of sphinx packets returned by the node and successfully received.
|
||||
pub packets_received: usize,
|
||||
|
||||
/// Round-trip time of the very first probe packet, sent in isolation before any load is applied.
|
||||
/// Because the node is idle at this point, this value approximates the baseline network latency
|
||||
/// to the node without any queuing or processing overhead from the stress test itself.
|
||||
/// `None` if the initial probe did not complete successfully.
|
||||
#[serde(default, with = "humantime_serde")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = Option<String>))]
|
||||
pub approximate_latency: Option<Duration>,
|
||||
|
||||
/// RTT statistics computed over all received packets, or `None` if no packets were received.
|
||||
pub packets_statistics: Option<LatencyDistribution>,
|
||||
|
||||
/// Latency distribution of individual batch send operations recorded during the load test.
|
||||
/// Reflects how long each batch took to flush to the OS socket, giving a rough measure of
|
||||
/// egress throughput. `None` if no batches were sent.
|
||||
pub sending_statistics: Option<LatencyDistribution>,
|
||||
|
||||
/// Whether any packet was received with an ID that had already been seen in this test run.
|
||||
/// Duplicates should never occur under normal operation; their presence may indicate a
|
||||
/// misbehaving or malicious node replaying packets.
|
||||
pub received_duplicates: bool,
|
||||
|
||||
/// Human-readable description of the first error that caused the test to abort if any.
|
||||
pub error: Option<String>,
|
||||
}
|
||||
|
||||
impl TestRunResult {
|
||||
pub fn received_ratio(&self) -> f64 {
|
||||
if self.packets_sent == 0 {
|
||||
return 0.0;
|
||||
}
|
||||
let received = self.packets_received.min(self.packets_sent);
|
||||
received as f64 / self.packets_sent as f64
|
||||
}
|
||||
}
|
||||
|
||||
/// Confirmation returned to an agent after a successful result submission.
|
||||
/// Currently empty — exists to give the response an explicit type rather than
|
||||
/// relying on `Json(())`.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TestRunSubmissionResponse {}
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
// Response shapes for the read-only results API (`/v1/results/*`). These are
|
||||
// the public, serialisation-stable types returned to callers; conversion from
|
||||
// the storage layer's sqlx rows happens in `orchestrator/storage/models.rs`.
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
pub const PAGINATION_SIZE_DEFAULT: usize = 50;
|
||||
pub const PAGINATION_SIZE_MAX: usize = 200;
|
||||
pub const PAGINATION_PAGE_DEFAULT: usize = 0;
|
||||
|
||||
/// Query parameters for paginated endpoints. `size` defaults to
|
||||
/// [`PAGINATION_SIZE_DEFAULT`] and is capped at [`PAGINATION_SIZE_MAX`];
|
||||
/// `page` defaults to [`PAGINATION_PAGE_DEFAULT`].
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::IntoParams))]
|
||||
#[cfg_attr(feature = "openapi", into_params(parameter_in = Query))]
|
||||
#[derive(Debug, Copy, Clone, Serialize, Deserialize)]
|
||||
pub struct Pagination {
|
||||
pub per_page: Option<usize>,
|
||||
pub page: Option<usize>,
|
||||
}
|
||||
|
||||
impl Default for Pagination {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
per_page: Some(PAGINATION_SIZE_DEFAULT),
|
||||
page: Some(PAGINATION_PAGE_DEFAULT),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Pagination {
|
||||
pub fn new(per_page: Option<usize>, page: Option<usize>) -> Self {
|
||||
Self { per_page, page }
|
||||
}
|
||||
|
||||
/// Resolved page size — defaults to [`PAGINATION_SIZE_DEFAULT`] when absent
|
||||
/// and is capped at [`PAGINATION_SIZE_MAX`].
|
||||
pub fn per_page(&self) -> usize {
|
||||
self.per_page
|
||||
.unwrap_or(PAGINATION_SIZE_DEFAULT)
|
||||
.min(PAGINATION_SIZE_MAX)
|
||||
}
|
||||
|
||||
/// Resolved page index — defaults to [`PAGINATION_PAGE_DEFAULT`] when absent.
|
||||
pub fn page(&self) -> usize {
|
||||
self.page.unwrap_or(PAGINATION_PAGE_DEFAULT)
|
||||
}
|
||||
|
||||
/// Value to bind to a SQL `LIMIT ?` clause. Equivalent to
|
||||
/// [`Self::per_page`] cast to the `i64` sqlx bind type.
|
||||
pub fn limit(&self) -> i64 {
|
||||
self.per_page() as i64
|
||||
}
|
||||
|
||||
/// Value to bind to a SQL `OFFSET ?` clause, i.e. `page * per_page`.
|
||||
/// Saturating to avoid overflow on absurdly large `page` values from a client.
|
||||
pub fn offset(&self) -> i64 {
|
||||
(self.page() as i64).saturating_mul(self.limit())
|
||||
}
|
||||
}
|
||||
|
||||
/// Generic wrapper for a single page of results.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Default, Clone, Serialize, Deserialize)]
|
||||
pub struct PagedResult<T> {
|
||||
pub page: usize,
|
||||
pub per_page: usize,
|
||||
pub total: usize,
|
||||
pub items: Vec<T>,
|
||||
}
|
||||
|
||||
/// Discriminator for the type of node targeted by a test run.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "lowercase")]
|
||||
pub enum TestType {
|
||||
Mixnode,
|
||||
Gateway,
|
||||
}
|
||||
|
||||
/// A completed test run as exposed by the results API.
|
||||
///
|
||||
/// Unlike the agent-facing [`TestRunResult`], this carries the database id,
|
||||
/// the node that was tested, and the timestamp at which the result was
|
||||
/// recorded by the orchestrator.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TestRunData {
|
||||
/// Database-assigned identifier of the test run.
|
||||
pub id: i64,
|
||||
|
||||
/// Node that was tested.
|
||||
pub node_id: u32,
|
||||
|
||||
/// Kind of node that was tested.
|
||||
pub test_type: TestType,
|
||||
|
||||
/// When the test run completed and was recorded.
|
||||
/// Serialised as an RFC 3339 timestamp string.
|
||||
#[serde(with = "time::serde::rfc3339")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub test_timestamp: OffsetDateTime,
|
||||
|
||||
/// The test run result itself.
|
||||
pub result: TestRunResult,
|
||||
}
|
||||
|
||||
/// Public snapshot of a nym-node as tracked by the orchestrator.
|
||||
///
|
||||
/// Built from the on-chain bond plus any details the orchestrator has managed
|
||||
/// to retrieve directly from the node itself. The optional fields
|
||||
/// (`mixnet_socket_address`, `noise_key`, `sphinx_key`, `key_rotation_id`)
|
||||
/// are populated lazily by the node refresher and may be absent either because
|
||||
/// the node is newly observed or because the refresher failed to reach it.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct NymNodeData {
|
||||
pub node_id: u32,
|
||||
|
||||
/// Ed25519 identity key of the node, serialised as a base58 string.
|
||||
#[serde(with = "bs58_ed25519_pubkey")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub identity_key: ed25519::PublicKey,
|
||||
|
||||
/// When this node was last observed as bonded in the contract.
|
||||
#[serde(with = "time::serde::rfc3339")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub last_seen_bonded: OffsetDateTime,
|
||||
|
||||
/// Mixnet socket address (host:port) at which the node accepts sphinx packets.
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub mixnet_socket_address: Option<SocketAddr>,
|
||||
|
||||
/// X25519 public key used for Noise handshakes.
|
||||
/// `None` if retrieval from the node failed.
|
||||
#[serde(with = "option_bs58_x25519_pubkey")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub noise_key: Option<x25519::PublicKey>,
|
||||
|
||||
/// Sphinx public key used for packet encryption.
|
||||
/// `None` if retrieval from the node failed.
|
||||
/// Always `None`/`Some` together with `key_rotation_id`.
|
||||
#[serde(with = "option_bs58_x25519_pubkey")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub sphinx_key: Option<x25519::PublicKey>,
|
||||
|
||||
/// Key rotation epoch ID that `sphinx_key` belongs to.
|
||||
/// `None` if retrieval from the node failed.
|
||||
/// Always `None`/`Some` together with `sphinx_key`.
|
||||
pub key_rotation_id: Option<i64>,
|
||||
}
|
||||
|
||||
/// Node snapshot paired with its most recent completed test run.
|
||||
///
|
||||
/// `latest_test_run` is `None` when the node has never been tested or when its
|
||||
/// most recent run has been evicted by the stale-result sweeper.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct NymNodeWithTestRun {
|
||||
pub node: NymNodeData,
|
||||
|
||||
pub latest_test_run: Option<TestRunData>,
|
||||
}
|
||||
|
||||
/// Marker for a test run that has been handed out to an agent but whose result
|
||||
/// hasn't been submitted yet. Stripped of test-payload fields because by
|
||||
/// definition none of them exist yet.
|
||||
#[cfg_attr(feature = "openapi", derive(utoipa::ToSchema))]
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TestRunInProgressData {
|
||||
pub node_id: u32,
|
||||
|
||||
/// When the test run was handed out to an agent. Serialised as an
|
||||
/// RFC 3339 timestamp string.
|
||||
#[serde(with = "time::serde::rfc3339")]
|
||||
#[cfg_attr(feature = "openapi", schema(value_type = String))]
|
||||
pub started_at: OffsetDateTime,
|
||||
}
|
||||
@@ -0,0 +1,64 @@
|
||||
[package]
|
||||
name = "nym-network-monitor-orchestrator"
|
||||
description = "Orchestrator for performing Nym network stress testing"
|
||||
version = "1.0.2"
|
||||
authors.workspace = true
|
||||
edition.workspace = true
|
||||
license.workspace = true
|
||||
repository.workspace = true
|
||||
homepage.workspace = true
|
||||
documentation.workspace = true
|
||||
rust-version.workspace = true
|
||||
readme.workspace = true
|
||||
publish = false
|
||||
|
||||
[dependencies]
|
||||
anyhow = { workspace = true }
|
||||
clap = { workspace = true, features = ["cargo", "env"] }
|
||||
futures = { workspace = true }
|
||||
humantime = { workspace = true }
|
||||
rand = { workspace = true }
|
||||
sqlx = { workspace = true, features = ["runtime-tokio-rustls", "sqlite", "macros", "migrate", "time"] }
|
||||
tokio = { workspace = true, features = ["macros", "sync", "rt-multi-thread"] }
|
||||
strum = { workspace = true }
|
||||
thiserror = { workspace = true }
|
||||
tracing = { workspace = true }
|
||||
time = { workspace = true }
|
||||
url = { workspace = true }
|
||||
zeroize = { workspace = true }
|
||||
|
||||
# http
|
||||
axum = { workspace = true, features = ["tokio", "macros"] }
|
||||
utoipa = { workspace = true, features = ["axum_extras", "time"] }
|
||||
utoipa-swagger-ui = { workspace = true, features = ["axum"] }
|
||||
utoipauto = { workspace = true }
|
||||
|
||||
nym-validator-client = { workspace = true, features = ["http-client"] }
|
||||
nym-bin-common = { workspace = true, features = ["basic_tracing", "output_format"] }
|
||||
nym-crypto = { workspace = true, features = ["asymmetric"] }
|
||||
nym-network-defaults = { workspace = true }
|
||||
nym-task = { workspace = true }
|
||||
nym-node-requests = { workspace = true, features = ["client"] }
|
||||
nym-metrics = { workspace = true }
|
||||
nym-http-api-common = { workspace = true, features = ["utoipa", "middleware"] }
|
||||
nym-api-requests = { workspace = true }
|
||||
|
||||
nym-network-monitor-orchestrator-requests = { path = "../nym-network-monitor-orchestrator-requests", features = ["openapi"] }
|
||||
|
||||
[dev-dependencies]
|
||||
nym-crypto = { workspace = true, features = ["asymmetric", "rand"] }
|
||||
nym-test-utils = { workspace = true }
|
||||
|
||||
[build-dependencies]
|
||||
anyhow = { workspace = true }
|
||||
sqlx = { workspace = true, features = [
|
||||
"runtime-tokio-rustls",
|
||||
"sqlite",
|
||||
"macros",
|
||||
"migrate",
|
||||
] }
|
||||
tokio = { workspace = true, features = ["rt-multi-thread", "macros"] }
|
||||
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
@@ -0,0 +1,19 @@
|
||||
# this will only work with VPN, otherwise remove the harbor part
|
||||
FROM harbor.nymte.ch/dockerhub/rust:latest AS builder
|
||||
|
||||
RUN apt update && apt install -yy libdbus-1-dev pkg-config libclang-dev
|
||||
|
||||
COPY ./ /usr/src/nym
|
||||
WORKDIR /usr/src/nym
|
||||
|
||||
RUN cargo build --bin nym-network-monitor-orchestrator --release
|
||||
|
||||
FROM harbor.nymte.ch/dockerhub/ubuntu:24.04
|
||||
|
||||
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
WORKDIR /nym
|
||||
|
||||
COPY --from=builder /usr/src/nym/target/release/nym-network-monitor-orchestrator ./
|
||||
|
||||
ENTRYPOINT ["/nym/nym-network-monitor-orchestrator", "run-orchestrator"]
|
||||
@@ -0,0 +1,37 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use anyhow::Context;
|
||||
use sqlx::{Connection, SqliteConnection};
|
||||
use std::env;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> anyhow::Result<()> {
|
||||
let out_dir = env::var("OUT_DIR")?;
|
||||
let database_path = format!("{out_dir}/orchestrator.sqlite");
|
||||
|
||||
// remove the db file if it already existed from previous build
|
||||
// in case it was from a different branch
|
||||
if std::fs::exists(&database_path)? {
|
||||
std::fs::remove_file(&database_path)?;
|
||||
}
|
||||
|
||||
let mut conn = SqliteConnection::connect(&format!("sqlite://{database_path}?mode=rwc"))
|
||||
.await
|
||||
.context("Failed to create SQLx database connection")?;
|
||||
|
||||
sqlx::migrate!("./migrations")
|
||||
.run(&mut conn)
|
||||
.await
|
||||
.context("Failed to perform SQLx migrations")?;
|
||||
|
||||
#[cfg(target_family = "unix")]
|
||||
println!("cargo:rustc-env=DATABASE_URL=sqlite://{}", &database_path);
|
||||
|
||||
#[cfg(target_family = "windows")]
|
||||
// for some strange reason we need to add a leading `/` to the windows path even though it's
|
||||
// not a valid windows path... but hey, it works...
|
||||
println!("cargo:rustc-env=DATABASE_URL=sqlite:///{}", &database_path);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
+139
@@ -0,0 +1,139 @@
|
||||
/*
|
||||
* Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
* SPDX-License-Identifier: GPL-3.0-only
|
||||
*/
|
||||
|
||||
CREATE TABLE metadata
|
||||
(
|
||||
id INTEGER PRIMARY KEY CHECK (id = 0),
|
||||
last_submitted_testrun_id INTEGER
|
||||
);
|
||||
|
||||
CREATE TABLE testrun
|
||||
(
|
||||
-- Surrogate primary key.
|
||||
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
|
||||
|
||||
-- The node under test. References nym_node(node_id); forward reference is allowed in
|
||||
-- SQLite since foreign keys are validated at INSERT time, not at CREATE TABLE time.
|
||||
-- Kept as a full column (rather than relying on nym_node.last_testrun) so that the
|
||||
-- node→testrun link survives once the node gets a newer run.
|
||||
node_id INTEGER NOT NULL REFERENCES nym_node (node_id),
|
||||
|
||||
-- Discriminator for the type of node under test; future-proofs the table for when we start testing gateways.
|
||||
test_type TEXT CHECK ( test_type IN ('mixnode', 'gateway') ) NOT NULL,
|
||||
|
||||
-- When this testrun has been performed.
|
||||
test_timestamp TIMESTAMP WITHOUT TIME ZONE NOT NULL,
|
||||
|
||||
-- How long the test took to complete, in microseconds, from the point of view of an agent.
|
||||
time_taken_us INTEGER NOT NULL,
|
||||
|
||||
-- Duration of the Noise handshake on the ingress (responder) side, in microseconds.
|
||||
-- NULL if the handshake did not complete.
|
||||
ingress_noise_handshake_us INTEGER,
|
||||
|
||||
-- Duration of the Noise handshake on the egress (initiator) side, in microseconds.
|
||||
-- NULL if the handshake did not complete.
|
||||
egress_noise_handshake_us INTEGER,
|
||||
|
||||
-- The (constant) per-hop delay applied to sphinx packets during the test run, in microseconds.
|
||||
sphinx_packet_delay_us INTEGER NOT NULL,
|
||||
|
||||
-- Number of sphinx packets sent to the node under test.
|
||||
packets_sent INTEGER NOT NULL DEFAULT 0,
|
||||
|
||||
-- Number of sphinx packets received back from the node under test.
|
||||
packets_received INTEGER NOT NULL DEFAULT 0,
|
||||
|
||||
-- RTT of the initial probe packet in microseconds, approximating baseline latency.
|
||||
-- NULL if the probe did not complete successfully.
|
||||
approximate_latency_us INTEGER,
|
||||
|
||||
-- RTT distribution (in microseconds) computed over all received packets.
|
||||
-- All five columns are NULL together when no packets were received.
|
||||
packets_rtt_min_us INTEGER,
|
||||
packets_rtt_mean_us INTEGER,
|
||||
packets_rtt_median_us INTEGER,
|
||||
packets_rtt_max_us INTEGER,
|
||||
packets_rtt_std_dev_us INTEGER,
|
||||
|
||||
-- Batch send latency distribution (in microseconds) recorded during the load test.
|
||||
-- All five columns are NULL together when no batches were sent.
|
||||
sending_latency_min_us INTEGER,
|
||||
sending_latency_mean_us INTEGER,
|
||||
sending_latency_median_us INTEGER,
|
||||
sending_latency_max_us INTEGER,
|
||||
sending_latency_std_dev_us INTEGER,
|
||||
|
||||
-- Whether any packet was received with a duplicate ID during this test run.
|
||||
received_duplicates BOOLEAN NOT NULL,
|
||||
|
||||
-- Human-readable description of the first error that caused the test to abort.
|
||||
-- NULL if the test completed without error.
|
||||
error TEXT
|
||||
|
||||
);
|
||||
|
||||
-- Supports efficient "all runs for node X, newest first" lookups.
|
||||
CREATE INDEX idx_testrun_node_id_timestamp ON testrun (node_id, test_timestamp DESC);
|
||||
|
||||
-- Supports efficient "all runs, newest first" lookups (the global testruns pagination endpoint).
|
||||
-- The composite index above cannot serve this query because its leading column is node_id.
|
||||
CREATE INDEX idx_testrun_test_timestamp ON testrun (test_timestamp DESC);
|
||||
|
||||
CREATE TABLE nym_node
|
||||
(
|
||||
-- Node ID as assigned by the mixnet contract.
|
||||
node_id INTEGER PRIMARY KEY NOT NULL,
|
||||
|
||||
-- Ed25519 identity key of the node, base58-encoded.
|
||||
-- A node_id always maps to exactly one identity_key and is never reassigned.
|
||||
-- The inverse is not true: the same identity_key may appear under multiple node_ids
|
||||
-- if the operator unbonds and rebonds, receiving a new contract-assigned node_id.
|
||||
identity_key TEXT NOT NULL,
|
||||
|
||||
-- When this node was last observed as bonded in the contract.
|
||||
last_seen_bonded TIMESTAMP WITHOUT TIME ZONE NOT NULL,
|
||||
|
||||
-- Mixnet socket address (host:port) at which the node accepts sphinx packets.
|
||||
mixnet_socket_address TEXT,
|
||||
|
||||
-- X25519 public key used for Noise handshakes, base58-encoded.
|
||||
-- NULL if retrieval from the node failed.
|
||||
noise_key TEXT,
|
||||
|
||||
-- Sphinx public key used for packet encryption, base58-encoded.
|
||||
-- NULL if retrieval from the node failed.
|
||||
-- Always NULL/non-NULL together with key_rotation_id.
|
||||
sphinx_key TEXT,
|
||||
|
||||
-- Key rotation epoch ID that the sphinx_key belongs to.
|
||||
-- NULL if retrieval from the node failed.
|
||||
-- Always NULL/non-NULL together with sphinx_key.
|
||||
key_rotation_id INTEGER,
|
||||
|
||||
-- Classification of the node based on the roles reported via its self-described endpoint.
|
||||
-- 'unknown' is used both before the node has been successfully queried and when a queried
|
||||
-- node reports no roles. Only nodes with node_type in ('mixnode', 'mixnode_and_gateway')
|
||||
-- are eligible for testruns today.
|
||||
node_type TEXT CHECK ( node_type IN ('unknown', 'mixnode', 'gateway', 'mixnode_and_gateway') ) NOT NULL DEFAULT 'unknown',
|
||||
|
||||
-- The most recent test run performed against this node. NULL if never tested.
|
||||
-- Set to NULL automatically when the referenced testrun row is evicted.
|
||||
last_testrun INTEGER REFERENCES testrun (id) ON DELETE SET NULL,
|
||||
|
||||
CHECK ((sphinx_key IS NULL) = (key_rotation_id IS NULL))
|
||||
);
|
||||
|
||||
-- Tracks nodes that currently have a test run in progress.
|
||||
-- At most one row per node (enforced by the PRIMARY KEY on node_id).
|
||||
-- A row is inserted when a run is dispatched and deleted when it completes or is abandoned.
|
||||
CREATE TABLE testrun_in_progress
|
||||
(
|
||||
-- The node currently being tested.
|
||||
node_id INTEGER PRIMARY KEY REFERENCES nym_node (node_id) NOT NULL,
|
||||
|
||||
-- When the in-progress run was started; used to detect stale/hung runs.
|
||||
started_at TIMESTAMP WITHOUT TIME ZONE NOT NULL
|
||||
)
|
||||
@@ -0,0 +1,15 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use nym_bin_common::bin_info_owned;
|
||||
use nym_bin_common::output_format::OutputFormat;
|
||||
|
||||
#[derive(clap::Args, Debug)]
|
||||
pub(crate) struct Args {
|
||||
#[clap(short, long, default_value_t = OutputFormat::default())]
|
||||
output: OutputFormat,
|
||||
}
|
||||
|
||||
pub(crate) fn execute(args: Args) {
|
||||
println!("{}", args.output.format(&bin_info_owned!()))
|
||||
}
|
||||
@@ -0,0 +1,40 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
/// Environment variable names used as fallbacks for CLI arguments.
|
||||
/// Each constant matches the `env = ...` attribute on the corresponding clap field.
|
||||
pub mod vars {
|
||||
// run orchestrator args
|
||||
pub const NYM_NETWORK_MONITOR_ORCHESTRATOR_AGENTS_TOKEN_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_ORCHESTRATOR_AGENTS_TOKEN";
|
||||
pub const NYM_NETWORK_MONITOR_ORCHESTRATOR_METRICS_AND_RESULTS_TOKEN_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_ORCHESTRATOR_METRICS_AND_RESULTS_TOKEN";
|
||||
pub const NYM_NETWORK_MONITOR_TEST_INTERVAL_ARG: &str = "NYM_NETWORK_MONITOR_TEST_INTERVAL";
|
||||
pub const NYM_NETWORK_MONITOR_TEST_TIMEOUT_ARG: &str = "NYM_NETWORK_MONITOR_TEST_TIMEOUT";
|
||||
pub const NYM_NETWORK_MONITOR_HTTP_SERVER_BIND_ADDRESS_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_HTTP_SERVER_BIND_ADDRESS";
|
||||
pub const NYM_NETWORK_MONITOR_NYM_API_ENDPOINT_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_NYM_API_ENDPOINT";
|
||||
pub const NYM_NETWORK_MONITOR_MNEMONIC_ARG: &str = "NYM_NETWORK_MONITOR_MNEMONIC";
|
||||
pub const NYM_NETWORK_MONITOR_RPC_URL_ARG: &str = "NYM_NETWORK_MONITOR_RPC_URL";
|
||||
pub const NYM_NETWORK_MONITOR_DATABASE_PATH_ARG: &str = "NYM_NETWORK_MONITOR_DATABASE_PATH";
|
||||
pub const NYM_NETWORK_MONITOR_PRIVATE_KEY_ARG: &str = "NYM_NETWORK_MONITOR_PRIVATE_KEY";
|
||||
pub const NYM_NETWORK_MONITOR_NODE_REFRESH_RATE_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_NODE_REFRESH_RATE";
|
||||
pub const NYM_NETWORK_MONITOR_NODE_INFO_QUERY_TIMEOUT_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_NODE_INFO_QUERY_TIMEOUT";
|
||||
pub const NYM_NETWORK_MONITOR_NETWORK_MONITORS_CONTRACT_ADDRESS_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_NETWORK_MONITORS_CONTRACT_ADDRESS";
|
||||
pub const NYM_NETWORK_MONITOR_MIXNET_CONTRACT_ADDRESS_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_MIXNET_CONTRACT_ADDRESS";
|
||||
pub const NYM_NETWORK_MONITOR_TESTRUN_EVICTION_AGE_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_TESTRUN_EVICTION_AGE";
|
||||
pub const NYM_NETWORK_MONITOR_CONCURRENT_NODE_QUERIES_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_CONCURRENT_NODE_QUERIES";
|
||||
pub const NYM_NETWORK_MONITOR_CHAIN_AUTH_CHECK_MAX_ATTEMPTS_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_CHAIN_AUTH_CHECK_MAX_ATTEMPTS";
|
||||
pub const NYM_NETWORK_MONITOR_CHAIN_AUTH_CHECK_RETRY_DELAY_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_CHAIN_AUTH_CHECK_RETRY_DELAY";
|
||||
pub const NYM_NETWORK_MONITOR_RESULT_SUBMISSION_INTERVAL_ARG: &str =
|
||||
"NYM_NETWORK_MONITOR_RESULT_SUBMISSION_INTERVAL";
|
||||
}
|
||||
@@ -0,0 +1,50 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use clap::{Parser, Subcommand};
|
||||
use nym_bin_common::bin_info;
|
||||
use std::sync::OnceLock;
|
||||
|
||||
mod build_info;
|
||||
mod env;
|
||||
mod run_orchestrator;
|
||||
|
||||
// Helper for passing LONG_VERSION to clap
|
||||
fn pretty_build_info_static() -> &'static str {
|
||||
static PRETTY_BUILD_INFORMATION: OnceLock<String> = OnceLock::new();
|
||||
PRETTY_BUILD_INFORMATION.get_or_init(|| bin_info!().pretty_print())
|
||||
}
|
||||
|
||||
/// Top-level CLI entry point for the network monitor agent.
|
||||
#[derive(Parser, Debug)]
|
||||
#[clap(author = "Nymtech", version, long_version = pretty_build_info_static(), about)]
|
||||
pub(crate) struct Cli {
|
||||
/// Path pointing to an env file that configures the binary.
|
||||
/// Useful in local testing setups against networks different from mainnet
|
||||
#[clap(short, long)]
|
||||
pub(crate) config_env_file: Option<std::path::PathBuf>,
|
||||
|
||||
#[command(subcommand)]
|
||||
pub(crate) command: Command,
|
||||
}
|
||||
|
||||
impl Cli {
|
||||
/// Dispatches execution to the subcommand selected by the user.
|
||||
pub(crate) async fn execute(self) -> anyhow::Result<()> {
|
||||
match self.command {
|
||||
Command::BuildInfo(args) => build_info::execute(args),
|
||||
Command::RunOrchestrator(args) => run_orchestrator::execute(*args).await?,
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Subcommand, Debug)]
|
||||
pub(crate) enum Command {
|
||||
/// Show build information of this binary
|
||||
BuildInfo(build_info::Args),
|
||||
|
||||
/// Run the network monitor orchestrator which will periodically
|
||||
/// issue work assignments for stress testing mixnodes
|
||||
RunOrchestrator(Box<run_orchestrator::Args>),
|
||||
}
|
||||
@@ -0,0 +1,214 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use super::env::vars::*;
|
||||
use crate::orchestrator::NetworkMonitorOrchestrator;
|
||||
use crate::orchestrator::config::Config;
|
||||
use anyhow::{Context, anyhow, bail};
|
||||
use nym_crypto::asymmetric::ed25519;
|
||||
use nym_validator_client::nyxd::bip39;
|
||||
use std::mem;
|
||||
use std::net::SocketAddr;
|
||||
use std::num::NonZeroU32;
|
||||
use std::path::PathBuf;
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use tracing::info;
|
||||
use url::Url;
|
||||
use zeroize::Zeroizing;
|
||||
|
||||
#[derive(clap::Args, Debug)]
|
||||
pub(crate) struct Args {
|
||||
/// Bearer token required by the agents requesting work assignments and submitting results.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_ORCHESTRATOR_AGENTS_TOKEN_ARG)]
|
||||
agents_token: String,
|
||||
|
||||
/// Bearer token used for accessing the metrics and results endpoints.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_ORCHESTRATOR_METRICS_AND_RESULTS_TOKEN_ARG)]
|
||||
metrics_and_results_token: String,
|
||||
|
||||
/// How often each node should be stress-tested (e.g. `30m`, `1h`).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_TEST_INTERVAL_ARG, value_parser = humantime::parse_duration, default_value = "2h")]
|
||||
test_interval: Duration,
|
||||
|
||||
/// Maximum time a single test run is allowed to run before being considered timed out
|
||||
/// (e.g. `5m`).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_TEST_TIMEOUT_ARG, value_parser = humantime::parse_duration, default_value = "5m")]
|
||||
test_timeout: Duration,
|
||||
|
||||
/// HTTP address to bind the HTTP server to (e.g. `0.0.0.0:8080`).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_HTTP_SERVER_BIND_ADDRESS_ARG, default_value = "0.0.0.0:8080")]
|
||||
http_server_bind_address: SocketAddr,
|
||||
|
||||
/// HTTP endpoint of the nym-api to which test results are submitted.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_NYM_API_ENDPOINT_ARG)]
|
||||
nym_api_endpoint: Url,
|
||||
|
||||
/// Mnemonic of the account used to authorise network monitor agents in the
|
||||
/// network monitors contract.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_MNEMONIC_ARG)]
|
||||
mnemonic: bip39::Mnemonic,
|
||||
|
||||
/// HTTPS RPC URL of a Nyx node (e.g. `https://rpc.nymtech.net`).
|
||||
/// If not provided, the default value from the environment will be retrieved (if available).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_RPC_URL_ARG)]
|
||||
rpc_url: Option<Url>,
|
||||
|
||||
/// Path to the SQLite database file.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_DATABASE_PATH_ARG)]
|
||||
database_path: PathBuf,
|
||||
|
||||
/// Base58-encoded Ed25519 private key used to authorise result submissions to the nym-api.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_PRIVATE_KEY_ARG)]
|
||||
private_key: String,
|
||||
|
||||
/// How often the list of bonded nym-nodes is refreshed from the mixnet contract
|
||||
/// (e.g. `10m`, `1h`).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_NODE_REFRESH_RATE_ARG, value_parser = humantime::parse_duration, default_value = "2h")]
|
||||
node_refresh_rate: Duration,
|
||||
|
||||
/// Timeout for querying a single node for its detailed information (sphinx key, noise key,
|
||||
/// etc.). Queries that exceed this budget leave the corresponding fields as `NULL`
|
||||
/// (e.g. `10s`).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_NODE_INFO_QUERY_TIMEOUT_ARG, value_parser = humantime::parse_duration, default_value = "10s")]
|
||||
node_info_query_timeout: Duration,
|
||||
|
||||
/// Bech32 address of the networks monitors contract used to authorise agents
|
||||
/// If not provided, the default value from the environment will be retrieved (if available).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_NETWORK_MONITORS_CONTRACT_ADDRESS_ARG)]
|
||||
network_monitors_contract_address: Option<String>,
|
||||
|
||||
/// Bech32 address of the mixnet contract used to retrieve the list of bonded nodes.
|
||||
/// If not provided, the default value from the environment will be retrieved (if available).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_MIXNET_CONTRACT_ADDRESS_ARG)]
|
||||
mixnet_contract_address: Option<String>,
|
||||
|
||||
/// Maximum age of a completed test run row before it is evicted from the local database.
|
||||
/// Rows older than this are assumed to have already been submitted to the nym-api
|
||||
/// (e.g. `7d`, `24h`).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_TESTRUN_EVICTION_AGE_ARG, value_parser = humantime::parse_duration, default_value = "7d",)]
|
||||
testrun_eviction_age: Duration,
|
||||
|
||||
/// Maximum number of nodes queried concurrently during a node refresh cycle.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_CONCURRENT_NODE_QUERIES_ARG, default_value_t = 32)]
|
||||
number_of_concurrent_node_queries: usize,
|
||||
|
||||
/// Maximum number of attempts (including the initial one) made to verify that this
|
||||
/// orchestrator's account is authorised in the network monitors contract before start-up.
|
||||
/// The process exits with an error once the budget is exhausted.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_CHAIN_AUTH_CHECK_MAX_ATTEMPTS_ARG, default_value = "10")]
|
||||
chain_authorisation_check_max_attempts: NonZeroU32,
|
||||
|
||||
/// Delay between consecutive chain authorisation checks during start-up (e.g. `1m`, `30s`).
|
||||
/// Applied both when the query itself fails and when it succeeds but the orchestrator is not
|
||||
/// (yet) listed.
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_CHAIN_AUTH_CHECK_RETRY_DELAY_ARG, value_parser = humantime::parse_duration, default_value = "1m")]
|
||||
chain_authorisation_check_retry_delay: Duration,
|
||||
|
||||
/// How often the orchestrator flushes accumulated test results to the nym-api as a signed
|
||||
/// batch submission (e.g. `15m`, `1h`).
|
||||
#[clap(long, env = NYM_NETWORK_MONITOR_RESULT_SUBMISSION_INTERVAL_ARG, value_parser = humantime::parse_duration, default_value = "15m")]
|
||||
result_submission_interval: Duration,
|
||||
}
|
||||
|
||||
impl Args {
|
||||
/// Converts the parsed CLI arguments into a [`Config`].
|
||||
///
|
||||
/// Returns an error if `mixnet_contract_address` is not a valid bech32 account address.
|
||||
///
|
||||
/// Note: `orchestrator_token`, `mnemonic`, and `private_key` are not part of [`Config`]
|
||||
/// and must be handled separately by the caller.
|
||||
pub(crate) fn build_orchestrator_config(&self) -> anyhow::Result<Config> {
|
||||
Ok(Config {
|
||||
nyxd_rpc_endpoint: self.rpc_url.clone(),
|
||||
nym_api_endpoint: self.nym_api_endpoint.clone(),
|
||||
http_server_bind_address: self.http_server_bind_address,
|
||||
test_interval: self.test_interval,
|
||||
test_timeout: self.test_timeout,
|
||||
database_path: self.database_path.clone(),
|
||||
node_refresh_rate: self.node_refresh_rate,
|
||||
node_info_query_timeout: self.node_info_query_timeout,
|
||||
network_monitors_contract_address: self
|
||||
.network_monitors_contract_address
|
||||
.as_ref()
|
||||
.map(|addr| addr.parse())
|
||||
.transpose()
|
||||
.map_err(|err| anyhow!("invalid network monitors contract address: {err}"))?,
|
||||
mixnet_contract_address: self
|
||||
.mixnet_contract_address
|
||||
.as_ref()
|
||||
.map(|addr| addr.parse())
|
||||
.transpose()
|
||||
.map_err(|err| anyhow!("invalid mixnet contract address: {err}"))?,
|
||||
testrun_eviction_age: self.testrun_eviction_age,
|
||||
number_of_concurrent_node_queries: self.number_of_concurrent_node_queries,
|
||||
chain_authorisation_check_max_attempts: self.chain_authorisation_check_max_attempts,
|
||||
chain_authorisation_check_retry_delay: self.chain_authorisation_check_retry_delay,
|
||||
result_submission_interval: self.result_submission_interval,
|
||||
})
|
||||
}
|
||||
|
||||
/// Moves the orchestrator agents token out of `self`, zeroizing the original.
|
||||
///
|
||||
/// Returns an error if the token is empty.
|
||||
pub(crate) fn take_agents_orchestrator_token(&mut self) -> anyhow::Result<Zeroizing<String>> {
|
||||
// we must never accept empty tokens
|
||||
if self.agents_token.is_empty() {
|
||||
bail!("provided orchestrator token is empty, please provide a non-empty value")
|
||||
}
|
||||
let taken = mem::take(&mut self.agents_token);
|
||||
Ok(Zeroizing::new(taken))
|
||||
}
|
||||
|
||||
/// Moves the orchestrator metrics-and-results token out of `self`, zeroizing the original.
|
||||
///
|
||||
/// Returns an error if the token is empty.
|
||||
pub(crate) fn take_metrics_and_results_orchestrator_token(
|
||||
&mut self,
|
||||
) -> anyhow::Result<Zeroizing<String>> {
|
||||
// we must never accept empty tokens
|
||||
if self.metrics_and_results_token.is_empty() {
|
||||
bail!("provided orchestrator token is empty, please provide a non-empty value")
|
||||
}
|
||||
let taken = mem::take(&mut self.metrics_and_results_token);
|
||||
Ok(Zeroizing::new(taken))
|
||||
}
|
||||
|
||||
/// Moves the raw Base58-encoded private key out of `self`, parses it into an Ed25519 key pair,
|
||||
/// and zeroizes the original string.
|
||||
///
|
||||
/// Returns an error if the value is not a valid Base58-encoded Ed25519 private key.
|
||||
pub(crate) fn take_identity_key(&mut self) -> anyhow::Result<Arc<ed25519::KeyPair>> {
|
||||
// whatever happens, we'll zeroize the value
|
||||
let taken = Zeroizing::new(mem::take(&mut self.private_key));
|
||||
|
||||
let private_key = ed25519::PrivateKey::from_base58_string(&taken)
|
||||
.context("malformed identity key provided")?;
|
||||
Ok(Arc::new(private_key.into()))
|
||||
}
|
||||
|
||||
/// Consumes `self` and returns the mnemonic.
|
||||
pub(crate) fn into_mnemonic(self) -> bip39::Mnemonic {
|
||||
self.mnemonic
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) async fn execute(mut args: Args) -> anyhow::Result<()> {
|
||||
info!("Starting network monitor orchestrator");
|
||||
let config = args.build_orchestrator_config()?;
|
||||
let identity_keys = args.take_identity_key()?;
|
||||
let agents_auth_token = args.take_agents_orchestrator_token()?;
|
||||
let metrics_and_results_auth_token = args.take_metrics_and_results_orchestrator_token()?;
|
||||
let mnemonic = args.into_mnemonic();
|
||||
|
||||
let mut orchestrator = NetworkMonitorOrchestrator::new(
|
||||
config,
|
||||
identity_keys,
|
||||
agents_auth_token,
|
||||
metrics_and_results_auth_token,
|
||||
mnemonic,
|
||||
)
|
||||
.await?;
|
||||
orchestrator.run().await?;
|
||||
Ok(())
|
||||
}
|
||||
@@ -0,0 +1,52 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use axum::Router;
|
||||
use nym_network_monitor_orchestrator_requests::routes;
|
||||
use utoipa::openapi::security::{Http, HttpAuthScheme, SecurityScheme};
|
||||
use utoipa::{Modify, OpenApi};
|
||||
use utoipa_swagger_ui::SwaggerUi;
|
||||
use utoipauto::utoipauto;
|
||||
|
||||
// manually import external structs which are behind feature flags because they
|
||||
// can't be automatically discovered
|
||||
// https://github.com/ProbablyClem/utoipauto/issues/13#issuecomment-1974911829
|
||||
#[utoipauto(
|
||||
paths = "./nym-network-monitor-v3/nym-network-monitor-orchestrator/src",
|
||||
"./nym-network-monitor-v3/nym-network-monitor-orchestrator-requests/src from nym-network-monitor-orchestrator-requests"
|
||||
)]
|
||||
#[derive(OpenApi)]
|
||||
#[openapi(
|
||||
info(title = "Nym Network Monitor Orchestrator API"),
|
||||
tags(),
|
||||
modifiers(&SecurityAddon),
|
||||
)]
|
||||
pub(crate) struct ApiDoc;
|
||||
|
||||
/// OpenAPI modifier that registers bearer-token security schemes for the API docs.
|
||||
struct SecurityAddon;
|
||||
|
||||
impl Modify for SecurityAddon {
|
||||
fn modify(&self, openapi: &mut utoipa::openapi::OpenApi) {
|
||||
if let Some(components) = openapi.components.as_mut() {
|
||||
// token authorising access to prometheus metrics and test-run results
|
||||
components.add_security_scheme(
|
||||
"metrics_and_results_token",
|
||||
SecurityScheme::Http(Http::new(HttpAuthScheme::Bearer)),
|
||||
);
|
||||
|
||||
// token authorising monitor agents
|
||||
components.add_security_scheme(
|
||||
"agents_token",
|
||||
SecurityScheme::Http(Http::new(HttpAuthScheme::Bearer)),
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns a router that serves the Swagger UI and the generated OpenAPI JSON spec.
|
||||
pub(crate) fn route<S: Send + Sync + 'static + Clone>() -> Router<S> {
|
||||
SwaggerUi::new(routes::SWAGGER)
|
||||
.url("/api-docs/openapi.json", ApiDoc::openapi())
|
||||
.into()
|
||||
}
|
||||
@@ -0,0 +1,69 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::http::state::AppState;
|
||||
use axum::Router;
|
||||
use axum::response::Redirect;
|
||||
use axum::routing::{MethodRouter, get};
|
||||
use nym_http_api_common::middleware::bearer_auth::AuthLayer;
|
||||
use nym_http_api_common::middleware::logging::log_request_debug;
|
||||
use nym_network_monitor_orchestrator_requests::routes;
|
||||
use nym_task::ShutdownToken;
|
||||
use std::net::SocketAddr;
|
||||
use std::sync::Arc;
|
||||
use tracing::{error, info};
|
||||
use zeroize::Zeroizing;
|
||||
|
||||
pub(crate) mod api_docs;
|
||||
pub(crate) mod v1;
|
||||
|
||||
/// Returns a handler that issues a 303 redirect to the Swagger UI.
|
||||
fn swagger_redirect<S: Clone + Send + Sync + 'static>() -> MethodRouter<S> {
|
||||
// redirects with 303 status code
|
||||
get(|| async { Redirect::to(routes::SWAGGER) })
|
||||
}
|
||||
|
||||
/// Assembles the full orchestrator HTTP router with Swagger UI, v1 API routes,
|
||||
/// bearer-auth middleware, and request logging.
|
||||
pub(crate) fn build_router(
|
||||
state: AppState,
|
||||
agents_auth_token: Arc<Zeroizing<String>>,
|
||||
metrics_and_results_auth_token: Arc<Zeroizing<String>>,
|
||||
) -> Router {
|
||||
let agents_auth = AuthLayer::new(agents_auth_token);
|
||||
let metrics_and_results_auth = AuthLayer::new(metrics_and_results_auth_token);
|
||||
|
||||
Router::new()
|
||||
.route(routes::ROOT, swagger_redirect())
|
||||
.merge(api_docs::route())
|
||||
.nest(
|
||||
routes::V1,
|
||||
v1::routes(agents_auth, metrics_and_results_auth),
|
||||
)
|
||||
.layer(axum::middleware::from_fn(log_request_debug))
|
||||
.with_state(state)
|
||||
}
|
||||
|
||||
/// Binds to `bind_address` and serves the given router until the shutdown token is cancelled.
|
||||
/// The listener is created with `into_make_service_with_connect_info` so handlers can
|
||||
/// extract the peer [`SocketAddr`].
|
||||
pub(crate) async fn run_http_server(
|
||||
router: Router,
|
||||
bind_address: SocketAddr,
|
||||
shutdown_token: ShutdownToken,
|
||||
) -> anyhow::Result<()> {
|
||||
let listener = tokio::net::TcpListener::bind(bind_address)
|
||||
.await
|
||||
.inspect_err(|err| error!("couldn't bind to address {bind_address}: {err}"))?;
|
||||
|
||||
info!("starting http api server on {bind_address}");
|
||||
|
||||
axum::serve(
|
||||
listener,
|
||||
router.into_make_service_with_connect_info::<SocketAddr>(),
|
||||
)
|
||||
.with_graceful_shutdown(async move { shutdown_token.cancelled().await })
|
||||
.await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -0,0 +1,243 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::http::api::v1::error::ApiError;
|
||||
use crate::http::state::AppState;
|
||||
use crate::orchestrator::prometheus::{PROMETHEUS_METRICS, PrometheusMetric};
|
||||
use axum::extract::{ConnectInfo, State};
|
||||
use axum::routing::post;
|
||||
use axum::{Json, Router};
|
||||
use nym_http_api_common::middleware::bearer_auth::AuthLayer;
|
||||
use nym_network_monitor_orchestrator_requests::models::{
|
||||
AgentAnnounceRequest, AgentAnnounceResponse, TestRunAssignmentRequest,
|
||||
TestRunAssignmentResponse, TestRunResultSubmissionRequest, TestRunSubmissionResponse,
|
||||
};
|
||||
use nym_network_monitor_orchestrator_requests::routes;
|
||||
use nym_validator_client::nyxd::contract_traits::NetworkMonitorsSigningClient;
|
||||
use std::net::SocketAddr;
|
||||
use tracing::{error, info};
|
||||
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_agent_announce",
|
||||
tag = "Network Monitor Agent",
|
||||
post,
|
||||
request_body = AgentAnnounceRequest,
|
||||
path = "/announce",
|
||||
context_path = "/v1/agent",
|
||||
security(("agents_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(AgentAnnounceResponse = "application/json"),
|
||||
)),
|
||||
(status = 500, description = "failed to announce agent to the network monitors contract"),
|
||||
)
|
||||
)]
|
||||
#[tracing::instrument(
|
||||
level = "debug",
|
||||
skip_all,
|
||||
fields(
|
||||
agent_pod = %addr
|
||||
)
|
||||
)]
|
||||
async fn announce_agent(
|
||||
ConnectInfo(addr): ConnectInfo<SocketAddr>,
|
||||
State(state): State<AppState>,
|
||||
Json(body): Json<AgentAnnounceRequest>,
|
||||
) -> Result<Json<AgentAnnounceResponse>, ApiError> {
|
||||
let pod_ip = addr.ip();
|
||||
info!("received announce request from pod at {pod_ip}: {body:?}");
|
||||
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::AgentAnnounceRequests);
|
||||
|
||||
// 1. upsert the agent in the cache and learn whether it has already been announced
|
||||
let already_announced = state
|
||||
.agents
|
||||
.try_announce_agent(body.agent_mix_socket_address, body.x25519_noise_key)
|
||||
.await;
|
||||
|
||||
// 2. if the agent was already announced, skip the contract tx
|
||||
if already_announced {
|
||||
info!("agent at {pod_ip} is already announced, skipping contract tx");
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::AgentDuplicateAnnouncementRequests);
|
||||
return Ok(Json(AgentAnnounceResponse {}));
|
||||
}
|
||||
|
||||
// 3. attempt to announce the agent to the network monitors contract
|
||||
state
|
||||
.validator_client
|
||||
.write()
|
||||
.await
|
||||
.nyxd
|
||||
.authorise_network_monitor(
|
||||
body.agent_mix_socket_address,
|
||||
body.x25519_noise_key.to_base58_string(),
|
||||
body.noise_version,
|
||||
None,
|
||||
)
|
||||
.await
|
||||
.inspect_err(|err| {
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::AgentContractAnnounceFailures);
|
||||
|
||||
error!("failed to announce agent to the network monitors contract: {err}")
|
||||
})
|
||||
.map_err(|_| ApiError::ContractFailure)?;
|
||||
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::AgentContractAnnounceSuccesses);
|
||||
|
||||
// 4. mark the agent as announced so subsequent calls are no-ops
|
||||
state
|
||||
.agents
|
||||
.mark_announced(body.agent_mix_socket_address)
|
||||
.await;
|
||||
|
||||
Ok(Json(AgentAnnounceResponse {}))
|
||||
}
|
||||
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_agent_request_testrun",
|
||||
tag = "Network Monitor Agent",
|
||||
post,
|
||||
request_body = TestRunAssignmentRequest,
|
||||
path = "/request-testrun",
|
||||
context_path = "/v1/agent",
|
||||
security(("agents_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(TestRunAssignmentResponse = "application/json"),
|
||||
)),
|
||||
(status = 400, description = "agent not found in cache, or agent has not yet been announced to the contract"),
|
||||
(status = 500, description = "failed to read from storage, or a stored field could not be decoded"),
|
||||
)
|
||||
)]
|
||||
#[tracing::instrument(
|
||||
level = "debug",
|
||||
skip_all,
|
||||
fields(
|
||||
agent_pod = %addr
|
||||
)
|
||||
)]
|
||||
async fn request_testrun(
|
||||
ConnectInfo(addr): ConnectInfo<SocketAddr>,
|
||||
State(state): State<AppState>,
|
||||
Json(body): Json<TestRunAssignmentRequest>,
|
||||
) -> Result<Json<TestRunAssignmentResponse>, ApiError> {
|
||||
let pod_ip = addr.ip();
|
||||
info!("received testrun request from pod at {pod_ip}");
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::AgentTestrunRequests);
|
||||
|
||||
// 1. ensure the agent still exists in our announced cache
|
||||
// in case there was a weird network failure between the calls
|
||||
let Some(agent) = state.agents.get_agent(body.agent_mix_socket_address).await else {
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::AgentUnknownAgentTestrunRequests);
|
||||
return Err(ApiError::AgentNotFound);
|
||||
};
|
||||
|
||||
if !agent.announced {
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::AgentTestrunRequestsWithoutAnnouncement);
|
||||
return Err(ApiError::AgentNotAnnounced);
|
||||
}
|
||||
|
||||
// 2. attempt to assign a testrun to the agent
|
||||
let assignment = state.assign_next_mixnode_testrun().await?;
|
||||
if assignment.is_none() {
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::EmptyTestrunAssignments);
|
||||
} else {
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::NonEmptyTestrunAssignments);
|
||||
}
|
||||
|
||||
Ok(Json(TestRunAssignmentResponse { assignment }))
|
||||
}
|
||||
|
||||
fn emit_testrun_result_metrics(result: &TestRunResultSubmissionRequest) {
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::TestRunResultSubmissions);
|
||||
|
||||
PROMETHEUS_METRICS.inc_by(
|
||||
PrometheusMetric::TestPacketsSent,
|
||||
result.result.packets_sent as i64,
|
||||
);
|
||||
PROMETHEUS_METRICS.inc_by(
|
||||
PrometheusMetric::TestPacketsReceived,
|
||||
result.result.packets_received as i64,
|
||||
);
|
||||
|
||||
PROMETHEUS_METRICS.observe_histogram(
|
||||
PrometheusMetric::TestDurationSeconds,
|
||||
result.result.time_taken.as_secs_f64(),
|
||||
);
|
||||
if let Some(latency) = result.result.approximate_latency {
|
||||
PROMETHEUS_METRICS.observe_histogram(
|
||||
PrometheusMetric::ApproximateNodeLatencyMs,
|
||||
latency.as_millis() as f64,
|
||||
);
|
||||
}
|
||||
PROMETHEUS_METRICS.observe_histogram(
|
||||
PrometheusMetric::TestrunReceivedPacketsRatio,
|
||||
result.result.received_ratio(),
|
||||
);
|
||||
|
||||
if let Some(packets_stats) = result.result.packets_statistics {
|
||||
PROMETHEUS_METRICS.observe_histogram(
|
||||
PrometheusMetric::AverageTestPacketRTTMs,
|
||||
packets_stats.mean.as_millis() as f64,
|
||||
);
|
||||
}
|
||||
|
||||
if result.result.error.is_some() {
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::TestrunsErrors)
|
||||
}
|
||||
}
|
||||
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_agent_submit_testrun_result",
|
||||
tag = "Network Monitor Agent",
|
||||
post,
|
||||
request_body = TestRunResultSubmissionRequest,
|
||||
path = "/submit-testrun-result",
|
||||
context_path = "/v1/agent",
|
||||
security(("agents_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(TestRunSubmissionResponse = "application/json"),
|
||||
)),
|
||||
(status = 500, description = "failed to persist the test run result to storage"),
|
||||
)
|
||||
)]
|
||||
#[tracing::instrument(
|
||||
level = "debug",
|
||||
skip_all,
|
||||
fields(
|
||||
agent_pod = %addr
|
||||
)
|
||||
)]
|
||||
async fn submit_testrun_result(
|
||||
ConnectInfo(addr): ConnectInfo<SocketAddr>,
|
||||
State(state): State<AppState>,
|
||||
Json(body): Json<TestRunResultSubmissionRequest>,
|
||||
) -> Result<Json<TestRunSubmissionResponse>, ApiError> {
|
||||
let pod_ip = addr.ip();
|
||||
|
||||
emit_testrun_result_metrics(&body);
|
||||
|
||||
info!(
|
||||
"received testrun result for node {} from pod at {pod_ip}",
|
||||
body.node_id
|
||||
);
|
||||
|
||||
state
|
||||
.submit_testrun_result(body.result, body.node_id)
|
||||
.await?;
|
||||
|
||||
Ok(Json(TestRunSubmissionResponse {}))
|
||||
}
|
||||
|
||||
/// Builds the agent sub-router with all agent endpoints behind bearer-token auth.
|
||||
pub(super) fn routes(auth_layer: AuthLayer) -> Router<AppState> {
|
||||
Router::new()
|
||||
.route(routes::v1::agent::ANNOUNCE, post(announce_agent))
|
||||
.route(routes::v1::agent::REQUEST_TESTRUN, post(request_testrun))
|
||||
.route(
|
||||
routes::v1::agent::SUBMIT_TESTRUN_RESULT,
|
||||
post(submit_testrun_result),
|
||||
)
|
||||
.route_layer(auth_layer)
|
||||
}
|
||||
@@ -0,0 +1,51 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use axum::http::StatusCode;
|
||||
use axum::response::{IntoResponse, Response};
|
||||
|
||||
/// Unified error type for all v1 API endpoints.
|
||||
/// The `Display` message from each variant is used as the HTTP response body.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub(crate) enum ApiError {
|
||||
#[error("agent information not found")]
|
||||
AgentNotFound,
|
||||
|
||||
#[error("failed to announce agent to the network monitors contract")]
|
||||
ContractFailure,
|
||||
|
||||
#[error("failed to read or write data from the database")]
|
||||
StorageFailure,
|
||||
|
||||
#[error("some of the stored data is malformed and could not be parsed")]
|
||||
MalformedStoredData,
|
||||
|
||||
#[error("agent hasn't been announced to the contract - can't assign testruns")]
|
||||
AgentNotAnnounced,
|
||||
|
||||
#[error("no test run found with the requested id")]
|
||||
TestRunNotFound,
|
||||
|
||||
#[error("no nym-node found with the requested node id")]
|
||||
NymNodeNotFound,
|
||||
}
|
||||
|
||||
impl ApiError {
|
||||
fn status_code(&self) -> StatusCode {
|
||||
use ApiError::*;
|
||||
|
||||
match self {
|
||||
AgentNotFound | AgentNotAnnounced => StatusCode::BAD_REQUEST,
|
||||
TestRunNotFound | NymNodeNotFound => StatusCode::NOT_FOUND,
|
||||
ContractFailure | StorageFailure | MalformedStoredData => {
|
||||
StatusCode::INTERNAL_SERVER_ERROR
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl IntoResponse for ApiError {
|
||||
fn into_response(self) -> Response {
|
||||
(self.status_code(), self.to_string()).into_response()
|
||||
}
|
||||
}
|
||||
+30
@@ -0,0 +1,30 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::http::state::AppState;
|
||||
use crate::orchestrator::prometheus::PROMETHEUS_METRICS;
|
||||
use axum::Router;
|
||||
use axum::routing::get;
|
||||
use nym_network_monitor_orchestrator_requests::routes;
|
||||
|
||||
/// Returns `prometheus` compatible metrics
|
||||
#[utoipa::path(
|
||||
get,
|
||||
path = "/prometheus",
|
||||
context_path = "/v1/metrics",
|
||||
tag = "Metrics",
|
||||
responses(
|
||||
(status = 200, body = String),
|
||||
(status = 400, description = "`Authorization` header was missing"),
|
||||
(status = 401, description = "Access token is missing or invalid"),
|
||||
),
|
||||
security(("metrics_and_results_token" = []))
|
||||
)]
|
||||
// the AuthLayer is protecting access to this endpoint
|
||||
pub(crate) async fn prometheus_metrics() -> String {
|
||||
PROMETHEUS_METRICS.metrics()
|
||||
}
|
||||
|
||||
pub(super) fn routes() -> Router<AppState> {
|
||||
Router::new().route(routes::v1::metrics::PROMETHEUS, get(prometheus_metrics))
|
||||
}
|
||||
@@ -0,0 +1,29 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::http::state::AppState;
|
||||
use axum::Router;
|
||||
use nym_http_api_common::middleware::bearer_auth::AuthLayer;
|
||||
use nym_network_monitor_orchestrator_requests::routes;
|
||||
|
||||
pub(crate) mod agent;
|
||||
pub(crate) mod error;
|
||||
pub(crate) mod metrics;
|
||||
pub(crate) mod results;
|
||||
|
||||
/// Assembles the v1 API router, nesting agent, metrics, and results sub-routers
|
||||
/// under their respective path prefixes. Metrics and results share the same
|
||||
/// bearer-auth layer.
|
||||
pub(crate) fn routes(
|
||||
agents_auth: AuthLayer,
|
||||
metrics_and_results_auth: AuthLayer,
|
||||
) -> Router<AppState> {
|
||||
Router::new()
|
||||
.nest(routes::v1::AGENT, agent::routes(agents_auth))
|
||||
.merge(
|
||||
Router::new()
|
||||
.nest(routes::v1::METRICS, metrics::routes())
|
||||
.nest(routes::v1::RESULTS, results::routes())
|
||||
.route_layer(metrics_and_results_auth),
|
||||
)
|
||||
}
|
||||
+230
@@ -0,0 +1,230 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
//! Read-only HTTP endpoints that expose the orchestrator's local database:
|
||||
//! the set of nym-nodes it is tracking, the test runs it has recorded, and
|
||||
//! the runs currently in flight.
|
||||
//!
|
||||
//! All handlers are thin wrappers that extract path/query parameters and
|
||||
//! delegate the actual work to [`AppState`]; conversion from storage rows to
|
||||
//! the public response shapes lives in [`crate::storage::models`]. Every
|
||||
//! route in this module is protected by the shared `metrics_and_results`
|
||||
//! bearer token applied one level up in [`crate::http::api::v1::routes`].
|
||||
|
||||
use crate::http::api::v1::error::ApiError;
|
||||
use crate::http::state::AppState;
|
||||
use axum::extract::{Path, Query, State};
|
||||
use axum::routing::get;
|
||||
use axum::{Json, Router};
|
||||
use nym_network_monitor_orchestrator_requests::models::{
|
||||
NymNodeData, NymNodeWithTestRun, PagedResult, Pagination, TestRunData, TestRunInProgressData,
|
||||
};
|
||||
use nym_network_monitor_orchestrator_requests::routes;
|
||||
use nym_validator_client::client::NodeId;
|
||||
|
||||
/// Fetches a single completed test run by its database-assigned id.
|
||||
/// Returns `404` with [`ApiError::TestRunNotFound`] if no such row exists — for
|
||||
/// example because the run has already been evicted by the stale-result sweeper.
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_results_testrun_by_id",
|
||||
tag = "Network Monitor Results",
|
||||
get,
|
||||
params(("id" = i64, Path, description = "Database-assigned test-run id")),
|
||||
path = "/testrun/{id}",
|
||||
context_path = "/v1/results",
|
||||
security(("metrics_and_results_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(TestRunData = "application/json"),
|
||||
)),
|
||||
(status = 404, description = "no test run found with the requested id"),
|
||||
(status = 500, description = "failed to read the test run from storage"),
|
||||
)
|
||||
)]
|
||||
async fn get_testrun_by_id(
|
||||
Path(id): Path<i64>,
|
||||
State(state): State<AppState>,
|
||||
) -> Result<Json<TestRunData>, ApiError> {
|
||||
state
|
||||
.get_testrun_by_id(id)
|
||||
.await?
|
||||
.map(Json)
|
||||
.ok_or(ApiError::TestRunNotFound)
|
||||
}
|
||||
|
||||
/// Fetches a single node along with its most recent completed test run.
|
||||
///
|
||||
/// The `latest_test_run` field is `None` if the node has never been tested or
|
||||
/// if its most recent run has been evicted. Returns `404` with
|
||||
/// [`ApiError::NymNodeNotFound`] if the orchestrator has never observed a bond
|
||||
/// for this `node_id`.
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_results_nym_node_by_node_id",
|
||||
tag = "Network Monitor Results",
|
||||
get,
|
||||
params(("node_id" = u32, Path, description = "Mixnet-contract node id")),
|
||||
path = "/nym-node/{node_id}",
|
||||
context_path = "/v1/results",
|
||||
security(("metrics_and_results_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(NymNodeWithTestRun = "application/json"),
|
||||
)),
|
||||
(status = 404, description = "no nym-node found with the requested node id"),
|
||||
(status = 500, description = "failed to read the node from storage, or a stored field could not be decoded"),
|
||||
)
|
||||
)]
|
||||
async fn get_nym_node_by_id(
|
||||
Path(node_id): Path<NodeId>,
|
||||
State(state): State<AppState>,
|
||||
) -> Result<Json<NymNodeWithTestRun>, ApiError> {
|
||||
state
|
||||
.get_nym_node_by_id(node_id)
|
||||
.await?
|
||||
.map(Json)
|
||||
.ok_or(ApiError::NymNodeNotFound)
|
||||
}
|
||||
|
||||
/// Paginated list of test runs currently dispatched to agents and awaiting results.
|
||||
///
|
||||
/// Ordered oldest-started first, so stale or hung runs surface at the top. In
|
||||
/// normal operation the underlying table holds roughly one entry per active
|
||||
/// agent. See [`Pagination`] for the page-size/page-number contract and default
|
||||
/// caps.
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_results_testruns_in_progress",
|
||||
tag = "Network Monitor Results",
|
||||
get,
|
||||
params(Pagination),
|
||||
path = "/testruns-in-progress",
|
||||
context_path = "/v1/results",
|
||||
security(("metrics_and_results_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(PagedResult<TestRunInProgressData> = "application/json"),
|
||||
)),
|
||||
(status = 500, description = "failed to read in-progress test runs from storage"),
|
||||
)
|
||||
)]
|
||||
async fn get_testruns_in_progress(
|
||||
Query(pagination): Query<Pagination>,
|
||||
State(state): State<AppState>,
|
||||
) -> Result<Json<PagedResult<TestRunInProgressData>>, ApiError> {
|
||||
state
|
||||
.get_testruns_in_progress_paginated(pagination)
|
||||
.await
|
||||
.map(Json)
|
||||
}
|
||||
|
||||
/// Paginated list of all completed test runs, newest first.
|
||||
///
|
||||
/// See [`Pagination`] for the page-size/page-number contract and default caps.
|
||||
/// `total` reflects the row count at the moment the page was read; it is
|
||||
/// fetched in the same transaction as the page itself to guarantee consistency.
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_results_testruns",
|
||||
tag = "Network Monitor Results",
|
||||
get,
|
||||
params(Pagination),
|
||||
path = "/testruns",
|
||||
context_path = "/v1/results",
|
||||
security(("metrics_and_results_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(PagedResult<TestRunData> = "application/json"),
|
||||
)),
|
||||
(status = 500, description = "failed to read test runs from storage"),
|
||||
)
|
||||
)]
|
||||
async fn get_testruns(
|
||||
Query(pagination): Query<Pagination>,
|
||||
State(state): State<AppState>,
|
||||
) -> Result<Json<PagedResult<TestRunData>>, ApiError> {
|
||||
state.get_testruns_paginated(pagination).await.map(Json)
|
||||
}
|
||||
|
||||
/// Paginated list of every node the orchestrator has ever observed as bonded,
|
||||
/// ordered by `node_id` ascending.
|
||||
///
|
||||
/// Nodes are only removed from this table if they are explicitly deleted; a
|
||||
/// node that has unbonded remains visible with its last-known `last_seen_bonded`
|
||||
/// timestamp.
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_results_nym_nodes",
|
||||
tag = "Network Monitor Results",
|
||||
get,
|
||||
params(Pagination),
|
||||
path = "/nym-nodes",
|
||||
context_path = "/v1/results",
|
||||
security(("metrics_and_results_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(PagedResult<NymNodeData> = "application/json"),
|
||||
)),
|
||||
(status = 500, description = "failed to read nodes from storage, or a stored field could not be decoded"),
|
||||
)
|
||||
)]
|
||||
async fn get_nym_nodes(
|
||||
Query(pagination): Query<Pagination>,
|
||||
State(state): State<AppState>,
|
||||
) -> Result<Json<PagedResult<NymNodeData>>, ApiError> {
|
||||
state.get_nym_nodes_paginated(pagination).await.map(Json)
|
||||
}
|
||||
|
||||
/// Paginated history of test runs for a single node, newest first.
|
||||
///
|
||||
/// If `node_id` is unknown or has never been tested the response is a valid
|
||||
/// empty page (`items: []`, `total: 0`) — there is no 404 here because the
|
||||
/// orchestrator can't tell from a zero-row result whether the node simply has
|
||||
/// no runs yet. Backed by the `idx_testrun_node_id_timestamp` index for
|
||||
/// efficient per-node lookups.
|
||||
#[utoipa::path(
|
||||
operation_id = "v1_results_nym_node_testruns",
|
||||
tag = "Network Monitor Results",
|
||||
get,
|
||||
params(
|
||||
("node_id" = u32, Path, description = "Mixnet-contract node id"),
|
||||
Pagination,
|
||||
),
|
||||
path = "/nym-node/{node_id}/testruns",
|
||||
context_path = "/v1/results",
|
||||
security(("metrics_and_results_token" = [])),
|
||||
responses(
|
||||
(status = 200, content(
|
||||
(PagedResult<TestRunData> = "application/json"),
|
||||
)),
|
||||
(status = 500, description = "failed to read test runs from storage"),
|
||||
)
|
||||
)]
|
||||
async fn get_nym_node_testruns(
|
||||
Path(node_id): Path<NodeId>,
|
||||
Query(pagination): Query<Pagination>,
|
||||
State(state): State<AppState>,
|
||||
) -> Result<Json<PagedResult<TestRunData>>, ApiError> {
|
||||
state
|
||||
.get_testruns_for_node_paginated(node_id, pagination)
|
||||
.await
|
||||
.map(Json)
|
||||
}
|
||||
|
||||
/// Builds the router for the `/v1/results` sub-tree. The caller is expected to
|
||||
/// nest this under [`routes::v1::RESULTS`] and to attach the shared
|
||||
/// metrics-and-results bearer-auth layer at the parent level.
|
||||
pub(super) fn routes() -> Router<AppState> {
|
||||
Router::new()
|
||||
.route(routes::v1::results::TESTRUN_BY_ID, get(get_testrun_by_id))
|
||||
.route(
|
||||
routes::v1::results::NYM_NODE_BY_NODE_ID,
|
||||
get(get_nym_node_by_id),
|
||||
)
|
||||
.route(
|
||||
routes::v1::results::NYM_NODE_TESTRUNS,
|
||||
get(get_nym_node_testruns),
|
||||
)
|
||||
.route(
|
||||
routes::v1::results::TESTRUNS_IN_PROGRESS,
|
||||
get(get_testruns_in_progress),
|
||||
)
|
||||
.route(routes::v1::results::TESTRUNS, get(get_testruns))
|
||||
.route(routes::v1::results::NYM_NODES, get(get_nym_nodes))
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
pub(crate) mod api;
|
||||
pub(crate) mod state;
|
||||
@@ -0,0 +1,471 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::http::api::v1::error::ApiError;
|
||||
use crate::orchestrator::prometheus::{PROMETHEUS_METRICS, PrometheusMetric};
|
||||
use crate::storage::NetworkMonitorStorage;
|
||||
use crate::storage::models::NewTestRun;
|
||||
use axum::extract::FromRef;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_network_monitor_orchestrator_requests::models::{
|
||||
NymNodeData, NymNodeWithTestRun, PagedResult, Pagination, TestRunAssignment, TestRunData,
|
||||
TestRunInProgressData, TestRunResult,
|
||||
};
|
||||
use nym_validator_client::DirectSigningHttpRpcValidatorClient;
|
||||
use nym_validator_client::client::NodeId;
|
||||
use nym_validator_client::nyxd::nym_network_monitors_contract_common::AuthorisedNetworkMonitor;
|
||||
use std::collections::HashMap;
|
||||
use std::net::{IpAddr, SocketAddr};
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use time::OffsetDateTime;
|
||||
use tokio::sync::{Mutex, RwLock};
|
||||
use tracing::error;
|
||||
|
||||
/// Thread-safe cache of all agents known to this orchestrator, keyed by host IP.
|
||||
/// Used to short-circuit the contract tx for already-announced agents.
|
||||
#[derive(Clone, Default)]
|
||||
pub(crate) struct KnownAgents {
|
||||
inner: Arc<Mutex<KnownAgentsInner>>,
|
||||
}
|
||||
|
||||
impl KnownAgents {
|
||||
/// Looks up an agent by its full mixnet socket address (host IP + port).
|
||||
/// Returns `None` if no agent is registered at that address.
|
||||
pub(crate) async fn get_agent(&self, address: SocketAddr) -> Option<KnownAgent> {
|
||||
let guard = self.inner.lock().await;
|
||||
let host_agents = guard.agents.get(&address.ip())?;
|
||||
|
||||
host_agents
|
||||
.iter()
|
||||
.find(|a| a.mixnet_port == address.port())
|
||||
.copied()
|
||||
}
|
||||
|
||||
/// Records an announcement from the agent at `mix_listener`. The cache entry
|
||||
/// is upserted: a missing entry is inserted, and if the cached noise key differs
|
||||
/// from the announced one it is overwritten and the agent is treated as
|
||||
/// not-yet-announced so the caller re-runs the contract tx with the new key.
|
||||
///
|
||||
/// Returns the current `announced` flag: `true` means the agent was already
|
||||
/// announced to the contract and the caller should skip the contract tx;
|
||||
/// `false` means the caller should submit the tx and call [`Self::mark_announced`]
|
||||
/// on success.
|
||||
pub(crate) async fn try_announce_agent(
|
||||
&self,
|
||||
mix_listener: SocketAddr,
|
||||
noise_key: x25519::PublicKey,
|
||||
) -> bool {
|
||||
let mut guard = self.inner.lock().await;
|
||||
let host_agents = guard.agents.entry(mix_listener.ip()).or_default();
|
||||
|
||||
if let Some(agent) = host_agents
|
||||
.iter_mut()
|
||||
.find(|agent| agent.mixnet_port == mix_listener.port())
|
||||
{
|
||||
agent.last_active_at = OffsetDateTime::now_utc();
|
||||
if agent.noise_key == noise_key {
|
||||
return agent.announced;
|
||||
}
|
||||
agent.noise_key = noise_key;
|
||||
agent.announced = false;
|
||||
guard.publish_gauges();
|
||||
return false;
|
||||
}
|
||||
|
||||
host_agents.push(KnownAgent {
|
||||
mixnet_port: mix_listener.port(),
|
||||
last_active_at: OffsetDateTime::now_utc(),
|
||||
noise_key,
|
||||
announced: false,
|
||||
});
|
||||
guard.publish_gauges();
|
||||
false
|
||||
}
|
||||
|
||||
/// Marks the agent at `mix_listener` as announced. Should be called after a
|
||||
/// successful contract transaction.
|
||||
pub(crate) async fn mark_announced(&self, mix_listener: SocketAddr) {
|
||||
let mut guard = self.inner.lock().await;
|
||||
let Some(host_agents) = guard.agents.get_mut(&mix_listener.ip()) else {
|
||||
return;
|
||||
};
|
||||
if let Some(agent) = host_agents
|
||||
.iter_mut()
|
||||
.find(|a| a.mixnet_port == mix_listener.port())
|
||||
{
|
||||
agent.announced = true;
|
||||
}
|
||||
guard.publish_gauges();
|
||||
}
|
||||
}
|
||||
|
||||
/// Rebuilds the agent cache from on-chain data. Used at orchestrator startup to
|
||||
/// restore state for agents that were authorised before a restart.
|
||||
impl TryFrom<Vec<AuthorisedNetworkMonitor>> for KnownAgents {
|
||||
type Error = anyhow::Error;
|
||||
|
||||
fn try_from(agents: Vec<AuthorisedNetworkMonitor>) -> Result<Self, Self::Error> {
|
||||
let mut agents_map = HashMap::new();
|
||||
|
||||
for agent in agents {
|
||||
let host_ip = agent.mixnet_address.ip();
|
||||
let noise_key = x25519::PublicKey::from_base58_string(&agent.bs58_x25519_noise)?;
|
||||
agents_map
|
||||
.entry(host_ip)
|
||||
.or_insert_with(Vec::new)
|
||||
.push(KnownAgent {
|
||||
mixnet_port: agent.mixnet_address.port(),
|
||||
// or should we use the authorisation ts?
|
||||
last_active_at: OffsetDateTime::now_utc(),
|
||||
noise_key,
|
||||
announced: true,
|
||||
});
|
||||
}
|
||||
|
||||
let inner = KnownAgentsInner { agents: agents_map };
|
||||
inner.publish_gauges();
|
||||
Ok(KnownAgents {
|
||||
inner: Arc::new(Mutex::new(inner)),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
/// Inner state behind the [`KnownAgents`] mutex.
|
||||
#[derive(Default)]
|
||||
struct KnownAgentsInner {
|
||||
/// Map from host IP to the list of agents running on that host.
|
||||
agents: HashMap<IpAddr, Vec<KnownAgent>>,
|
||||
}
|
||||
|
||||
impl KnownAgentsInner {
|
||||
/// Recomputes and publishes the `known_agents_*` gauges. Called from every mutation of
|
||||
/// the inner map — we recount rather than incrementally adjust so the gauges stay correct
|
||||
/// even if a future code path mutates state without going through a dedicated helper.
|
||||
fn publish_gauges(&self) {
|
||||
let (total, announced) =
|
||||
self.agents
|
||||
.values()
|
||||
.fold((0i64, 0i64), |(total, announced), agents| {
|
||||
let t = total + agents.len() as i64;
|
||||
let a = announced + agents.iter().filter(|a| a.announced).count() as i64;
|
||||
(t, a)
|
||||
});
|
||||
PROMETHEUS_METRICS.set(PrometheusMetric::KnownAgentsTotal, total);
|
||||
PROMETHEUS_METRICS.set(PrometheusMetric::KnownAgentsAnnounced, announced);
|
||||
}
|
||||
}
|
||||
|
||||
/// Cached state of a single known agent on a particular host.
|
||||
#[derive(Clone, Copy, Debug)]
|
||||
pub(crate) struct KnownAgent {
|
||||
pub(crate) mixnet_port: u16,
|
||||
pub(crate) last_active_at: OffsetDateTime,
|
||||
pub(crate) noise_key: x25519::PublicKey,
|
||||
|
||||
/// Whether this agent has been successfully registered in the smart contract.
|
||||
/// Set to `true` when restored from the chain at startup, or after a successful
|
||||
/// `/announce` contract transaction.
|
||||
pub(crate) announced: bool,
|
||||
}
|
||||
|
||||
/// Coordinates test run assignment and result storage.
|
||||
///
|
||||
/// Wraps the underlying [`NetworkMonitorStorage`] and applies the configured
|
||||
/// `testrun_staleness_age` when deciding which nodes are eligible for testing.
|
||||
#[derive(Clone)]
|
||||
pub(crate) struct TestrunManager {
|
||||
/// Minimum time that must elapse after a node's last test before it becomes
|
||||
/// eligible for another one. Passed to the storage layer as a staleness gate.
|
||||
testrun_staleness_age: Duration,
|
||||
}
|
||||
|
||||
impl TestrunManager {
|
||||
/// Selects the most stale idle mixnode and atomically marks it as having a test
|
||||
/// in progress. Returns `None` if no mixnode is currently eligible.
|
||||
async fn assign_next_mixnode_testrun(
|
||||
&self,
|
||||
storage: &NetworkMonitorStorage,
|
||||
) -> Result<Option<TestRunAssignment>, ApiError> {
|
||||
let node_to_test = match storage
|
||||
.assign_next_mixnode_testrun(self.testrun_staleness_age)
|
||||
.await
|
||||
{
|
||||
Ok(node) => node,
|
||||
Err(err) => {
|
||||
error!("testrun assignment storage failure: {err}");
|
||||
return Err(ApiError::StorageFailure);
|
||||
}
|
||||
};
|
||||
|
||||
let Some(node) = node_to_test.map(|n| n.inner) else {
|
||||
return Ok(None);
|
||||
};
|
||||
|
||||
let (Some(address), Some(noise_key), Some(sphinx_key), Some(key_rotation)) = (
|
||||
node.mixnet_socket_address,
|
||||
node.noise_key,
|
||||
node.sphinx_key,
|
||||
node.key_rotation_id,
|
||||
) else {
|
||||
// this should never happen as the db query should ignore entries where those fields are set to NULL
|
||||
error!(
|
||||
"database inconsistency - attempted to assign node {} for stress testing, but we don't have its complete data",
|
||||
node.node_id
|
||||
);
|
||||
return Err(ApiError::StorageFailure);
|
||||
};
|
||||
|
||||
let Ok(node_address) = address.parse() else {
|
||||
return Err(ApiError::MalformedStoredData);
|
||||
};
|
||||
|
||||
let Ok(noise_key) = noise_key.parse() else {
|
||||
return Err(ApiError::MalformedStoredData);
|
||||
};
|
||||
|
||||
let Ok(sphinx_key) = sphinx_key.parse() else {
|
||||
return Err(ApiError::MalformedStoredData);
|
||||
};
|
||||
|
||||
Ok(Some(TestRunAssignment {
|
||||
node_id: node.node_id as u32,
|
||||
node_address,
|
||||
noise_key,
|
||||
sphinx_key,
|
||||
key_rotation_id: key_rotation as u32,
|
||||
}))
|
||||
}
|
||||
|
||||
/// Persists a completed test run result to the database and updates the
|
||||
/// node's `last_testrun` pointer.
|
||||
async fn submit_testrun_result(
|
||||
&self,
|
||||
storage: &NetworkMonitorStorage,
|
||||
result: TestRunResult,
|
||||
node_id: NodeId,
|
||||
) -> Result<(), ApiError> {
|
||||
// currently all testruns are mixnode results
|
||||
let testrun = NewTestRun::from_mixnode_result(node_id, result);
|
||||
if let Err(err) = storage.insert_test_run(&testrun).await {
|
||||
error!("testrun result storage failure: {err}");
|
||||
return Err(ApiError::StorageFailure);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
/// Shared application state available to all axum request handlers.
|
||||
#[derive(Clone, FromRef)]
|
||||
pub(crate) struct AppState {
|
||||
pub(crate) agents: KnownAgents,
|
||||
|
||||
pub(crate) testrun_manager: TestrunManager,
|
||||
|
||||
pub(crate) storage: NetworkMonitorStorage,
|
||||
|
||||
pub(crate) validator_client: Arc<RwLock<DirectSigningHttpRpcValidatorClient>>,
|
||||
}
|
||||
|
||||
impl AppState {
|
||||
pub(crate) fn new(
|
||||
agents: KnownAgents,
|
||||
storage: NetworkMonitorStorage,
|
||||
testrun_staleness_age: Duration,
|
||||
validator_client: Arc<RwLock<DirectSigningHttpRpcValidatorClient>>,
|
||||
) -> Self {
|
||||
AppState {
|
||||
agents,
|
||||
storage,
|
||||
testrun_manager: TestrunManager {
|
||||
testrun_staleness_age,
|
||||
},
|
||||
validator_client,
|
||||
}
|
||||
}
|
||||
|
||||
/// Selects the most stale idle mixnode and atomically marks it as having a test
|
||||
/// in progress. Returns `None` if no mixnode is currently eligible.
|
||||
pub(crate) async fn assign_next_mixnode_testrun(
|
||||
&self,
|
||||
) -> Result<Option<TestRunAssignment>, ApiError> {
|
||||
self.testrun_manager
|
||||
.assign_next_mixnode_testrun(&self.storage)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Persists a completed test run result to the database and updates the
|
||||
/// node's `last_testrun` pointer.
|
||||
pub(crate) async fn submit_testrun_result(
|
||||
&self,
|
||||
result: TestRunResult,
|
||||
node_id: NodeId,
|
||||
) -> Result<(), ApiError> {
|
||||
self.testrun_manager
|
||||
.submit_testrun_result(&self.storage, result, node_id)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Backs `GET /v1/results/testrun/{id}`. `Ok(None)` means the row doesn't
|
||||
/// exist (the handler maps this to a 404); storage errors are logged and
|
||||
/// collapsed to [`ApiError::StorageFailure`].
|
||||
pub(crate) async fn get_testrun_by_id(&self, id: i64) -> Result<Option<TestRunData>, ApiError> {
|
||||
let result = match self.storage.get_testrun_by_id(id).await {
|
||||
Err(err) => {
|
||||
error!("get_testrun_by_id storage failure: {err}");
|
||||
return Err(ApiError::StorageFailure);
|
||||
}
|
||||
Ok(None) => return Ok(None),
|
||||
Ok(Some(testrun)) => testrun,
|
||||
};
|
||||
|
||||
Ok(Some(result.into()))
|
||||
}
|
||||
|
||||
/// Backs `GET /v1/results/nym-node/{node_id}`. If the node is known, its
|
||||
/// snapshot is returned along with the most recent completed test run
|
||||
/// (fetched in a second query via [`Self::get_testrun_by_id`]);
|
||||
/// `latest_test_run` is `None` when no such run exists.
|
||||
///
|
||||
/// Malformed stored data (e.g. an unparsable base58 key) is surfaced as
|
||||
/// [`ApiError::MalformedStoredData`]; this should never happen in practice
|
||||
/// because the orchestrator writes these fields itself.
|
||||
pub(crate) async fn get_nym_node_by_id(
|
||||
&self,
|
||||
node_id: NodeId,
|
||||
) -> Result<Option<NymNodeWithTestRun>, ApiError> {
|
||||
let nym_node = match self.storage.get_nym_node_by_id(node_id).await {
|
||||
Err(err) => {
|
||||
error!("get_nym_node_by_id storage failure: {err}");
|
||||
return Err(ApiError::StorageFailure);
|
||||
}
|
||||
Ok(None) => return Ok(None),
|
||||
Ok(Some(nym_node)) => nym_node,
|
||||
};
|
||||
|
||||
let latest_test_run = match nym_node.last_testrun {
|
||||
None => None,
|
||||
Some(testrun_id) => self.get_testrun_by_id(testrun_id).await?,
|
||||
};
|
||||
|
||||
Ok(Some(NymNodeWithTestRun {
|
||||
node: nym_node.try_into().map_err(|err| {
|
||||
error!("get_nym_node_by_id malformed stored data: {err}");
|
||||
ApiError::MalformedStoredData
|
||||
})?,
|
||||
latest_test_run,
|
||||
}))
|
||||
}
|
||||
|
||||
/// Backs `GET /v1/results/testruns-in-progress`. Returns a page of rows
|
||||
/// from `testrun_in_progress` ordered oldest `started_at` first so stale
|
||||
/// runs surface at the top.
|
||||
pub(crate) async fn get_testruns_in_progress_paginated(
|
||||
&self,
|
||||
pagination: Pagination,
|
||||
) -> Result<PagedResult<TestRunInProgressData>, ApiError> {
|
||||
let (in_progress, total) = match self
|
||||
.storage
|
||||
.get_testruns_in_progress_paginated(pagination)
|
||||
.await
|
||||
{
|
||||
Err(err) => {
|
||||
error!("get_testruns_in_progress_paginated storage failure: {err}");
|
||||
return Err(ApiError::StorageFailure);
|
||||
}
|
||||
Ok(result) => result,
|
||||
};
|
||||
|
||||
Ok(PagedResult {
|
||||
page: pagination.page(),
|
||||
per_page: in_progress.len(),
|
||||
total,
|
||||
items: in_progress.into_iter().map(Into::into).collect(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Backs `GET /v1/results/testruns`. Returns a single page of completed
|
||||
/// runs ordered newest first, together with the total row count at the
|
||||
/// time the page was read (fetched in the same transaction as the page
|
||||
/// itself for consistency).
|
||||
pub(crate) async fn get_testruns_paginated(
|
||||
&self,
|
||||
pagination: Pagination,
|
||||
) -> Result<PagedResult<TestRunData>, ApiError> {
|
||||
let (testruns, total) = match self.storage.get_testruns_paginated(pagination).await {
|
||||
Err(err) => {
|
||||
error!("get_testruns_paginated storage failure: {err}");
|
||||
return Err(ApiError::StorageFailure);
|
||||
}
|
||||
Ok(testruns) => testruns,
|
||||
};
|
||||
|
||||
Ok(PagedResult {
|
||||
page: pagination.page(),
|
||||
per_page: testruns.len(),
|
||||
total,
|
||||
items: testruns.into_iter().map(Into::into).collect(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Backs `GET /v1/results/nym-nodes`. Returns a page of nodes ordered by
|
||||
/// `node_id` ascending. Each row is converted to [`NymNodeData`] via the
|
||||
/// fallible `TryFrom` impl that decodes stored base58 keys; a failure
|
||||
/// anywhere in the page produces [`ApiError::MalformedStoredData`].
|
||||
pub(crate) async fn get_nym_nodes_paginated(
|
||||
&self,
|
||||
pagination: Pagination,
|
||||
) -> Result<PagedResult<NymNodeData>, ApiError> {
|
||||
let (nym_nodes, total) = match self.storage.get_nym_nodes_paginated(pagination).await {
|
||||
Err(err) => {
|
||||
error!("get_nym_nodes_paginated storage failure: {err}");
|
||||
return Err(ApiError::StorageFailure);
|
||||
}
|
||||
Ok((nym_nodes, total)) => (nym_nodes, total),
|
||||
};
|
||||
|
||||
let mut items = Vec::with_capacity(nym_nodes.len());
|
||||
for node in nym_nodes {
|
||||
items.push(node.try_into().map_err(|err| {
|
||||
error!("get_nym_nodes_paginated malformed stored data: {err}");
|
||||
ApiError::MalformedStoredData
|
||||
})?);
|
||||
}
|
||||
|
||||
Ok(PagedResult {
|
||||
page: pagination.page(),
|
||||
per_page: items.len(),
|
||||
total,
|
||||
items,
|
||||
})
|
||||
}
|
||||
|
||||
/// Backs `GET /v1/results/nym-node/{node_id}/testruns`. Returns a page of
|
||||
/// completed runs for a single node ordered newest first. Unknown or
|
||||
/// never-tested nodes produce a valid empty page (`total: 0`) rather than
|
||||
/// a 404.
|
||||
pub(crate) async fn get_testruns_for_node_paginated(
|
||||
&self,
|
||||
node_id: NodeId,
|
||||
pagination: Pagination,
|
||||
) -> Result<PagedResult<TestRunData>, ApiError> {
|
||||
let (testruns, total) = match self
|
||||
.storage
|
||||
.get_testruns_for_node_paginated(node_id, pagination)
|
||||
.await
|
||||
{
|
||||
Err(err) => {
|
||||
error!("get_testruns_for_node_paginated storage failure: {err}");
|
||||
return Err(ApiError::StorageFailure);
|
||||
}
|
||||
Ok((testruns, total)) => (testruns, total),
|
||||
};
|
||||
|
||||
Ok(PagedResult {
|
||||
page: pagination.page(),
|
||||
per_page: testruns.len(),
|
||||
total,
|
||||
items: testruns.into_iter().map(Into::into).collect(),
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,50 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::cli::Cli;
|
||||
use clap::Parser;
|
||||
use nym_bin_common::bin_info_owned;
|
||||
use nym_bin_common::logging::tracing_subscriber::layer::SubscriberExt;
|
||||
use nym_bin_common::logging::tracing_subscriber::util::SubscriberInitExt;
|
||||
use nym_bin_common::logging::{
|
||||
default_tracing_env_filter, default_tracing_fmt_layer, tracing_subscriber,
|
||||
};
|
||||
use nym_network_defaults::setup_env;
|
||||
use tracing::info;
|
||||
|
||||
pub(crate) mod cli;
|
||||
mod http;
|
||||
pub(crate) mod orchestrator;
|
||||
mod storage;
|
||||
|
||||
fn setup_logger() -> anyhow::Result<()> {
|
||||
// crates that are more granularly filtered, regardless of default `RUST_LOG` value
|
||||
let filter_crates = ["reqwest", "hyper"];
|
||||
|
||||
let mut env_filter = default_tracing_env_filter();
|
||||
for crate_name in filter_crates {
|
||||
env_filter = env_filter.add_directive(format!("{crate_name}=warn").parse()?);
|
||||
}
|
||||
|
||||
tracing_subscriber::registry()
|
||||
.with(default_tracing_fmt_layer(std::io::stderr))
|
||||
.with(env_filter)
|
||||
.init();
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> anyhow::Result<()> {
|
||||
setup_logger()?;
|
||||
let cli = Cli::parse();
|
||||
setup_env(cli.config_env_file.as_ref());
|
||||
|
||||
let bin_info = bin_info_owned!();
|
||||
info!("using the following version: {bin_info}");
|
||||
|
||||
cli.execute().await?;
|
||||
|
||||
info!("network monitor orchestrator is done - quitting");
|
||||
Ok(())
|
||||
}
|
||||
@@ -0,0 +1,123 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use anyhow::Context;
|
||||
use nym_network_defaults::{NymNetworkDetails, env_configured};
|
||||
use nym_validator_client::nyxd::AccountId;
|
||||
use nym_validator_client::{client, nyxd};
|
||||
use std::net::SocketAddr;
|
||||
use std::num::NonZeroU32;
|
||||
use std::path::PathBuf;
|
||||
use std::time::Duration;
|
||||
use tracing::info;
|
||||
use url::Url;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub(crate) struct Config {
|
||||
/// HTTPS RPC URL of a Nyx node (e.g. `https://rpc.nymtech.net`).
|
||||
/// If not provided, the default value from the environment will be retrieved (if available).
|
||||
pub(crate) nyxd_rpc_endpoint: Option<Url>,
|
||||
|
||||
/// HTTP endpoint of the nym-api to which test results are submitted.
|
||||
pub(crate) nym_api_endpoint: Url,
|
||||
|
||||
/// HTTP address to bind the HTTP server to (e.g. `0.0.0.0:8080`).
|
||||
pub(crate) http_server_bind_address: SocketAddr,
|
||||
|
||||
/// How often each node should be stress-tested (e.g. `30m`, `1h`).
|
||||
pub(crate) test_interval: Duration,
|
||||
|
||||
/// Maximum time a single test run is allowed to run before being considered timed out
|
||||
/// (e.g. `5m`).
|
||||
pub(crate) test_timeout: Duration,
|
||||
|
||||
/// Path to the SQLite database file.
|
||||
pub(crate) database_path: PathBuf,
|
||||
|
||||
/// How often the list of bonded nym-nodes is refreshed from the mixnet contract
|
||||
/// (e.g. `10m`, `1h`).
|
||||
pub(crate) node_refresh_rate: Duration,
|
||||
|
||||
/// Timeout for querying a single node for its detailed information (sphinx key, noise key,
|
||||
/// etc.). Queries that exceed this budget leave the corresponding fields as `NULL`
|
||||
/// (e.g. `10s`).
|
||||
pub(crate) node_info_query_timeout: Duration,
|
||||
|
||||
/// Bech32 address of the mixnet contract used to retrieve the list of bonded nodes.
|
||||
/// If not provided, the default value from the environment will be retrieved (if available).
|
||||
pub(crate) mixnet_contract_address: Option<AccountId>,
|
||||
|
||||
/// Bech32 address of the networks monitors contract used to authorise agents
|
||||
/// If not provided, the default value from the environment will be retrieved (if available).
|
||||
pub(crate) network_monitors_contract_address: Option<AccountId>,
|
||||
|
||||
/// Maximum age of a completed test run row before it is evicted from the local database.
|
||||
/// Rows older than this are assumed to have already been submitted to the nym-api
|
||||
/// (e.g. `7d`, `24h`).
|
||||
pub(crate) testrun_eviction_age: Duration,
|
||||
|
||||
/// Maximum number of nodes queried concurrently during a node refresh cycle.
|
||||
pub(crate) number_of_concurrent_node_queries: usize,
|
||||
|
||||
/// Maximum number of attempts (including the initial one) made to verify that the
|
||||
/// orchestrator's account is authorised in the network monitors contract before start-up.
|
||||
/// The process exits with an error once the budget is exhausted.
|
||||
pub(crate) chain_authorisation_check_max_attempts: NonZeroU32,
|
||||
|
||||
/// Delay between consecutive chain authorisation checks during start-up. Applied both when
|
||||
/// the query itself fails and when it succeeds but the orchestrator is not (yet) listed.
|
||||
pub(crate) chain_authorisation_check_retry_delay: Duration,
|
||||
|
||||
/// How often the orchestrator flushes accumulated test results to the nym-api as a signed
|
||||
/// batch submission (e.g. `15m`, `1h`).
|
||||
pub(crate) result_submission_interval: Duration,
|
||||
}
|
||||
|
||||
impl Config {
|
||||
/// Builds the validator client configuration from the orchestrator config.
|
||||
/// Falls back to environment-provided network details when RPC endpoint or
|
||||
/// contract addresses are not explicitly set.
|
||||
pub(crate) fn try_build_validator_client_config(&self) -> anyhow::Result<client::Config> {
|
||||
let mut base_network_details = if env_configured() {
|
||||
info!("using base NymNetworkDetails from env vars");
|
||||
NymNetworkDetails::new_from_env()
|
||||
} else {
|
||||
info!("using mainnet as base for NymNetworkDetails");
|
||||
NymNetworkDetails::new_mainnet()
|
||||
};
|
||||
|
||||
let nyxd_endpoint = if let Some(provided) = &self.nyxd_rpc_endpoint {
|
||||
provided.clone()
|
||||
} else {
|
||||
base_network_details
|
||||
.endpoints
|
||||
.first()
|
||||
.context("no nyxd endpoints provided")?
|
||||
.nyxd_url
|
||||
.parse()?
|
||||
};
|
||||
|
||||
if let Some(mixnet_contract_address) = &self.mixnet_contract_address {
|
||||
info!("overwriting mixnet contract address with {mixnet_contract_address}");
|
||||
base_network_details.contracts.mixnet_contract_address =
|
||||
Some(mixnet_contract_address.to_string());
|
||||
}
|
||||
|
||||
if let Some(network_monitors_contract_address) = &self.network_monitors_contract_address {
|
||||
info!(
|
||||
"overwriting network monitors contract address with {network_monitors_contract_address}"
|
||||
);
|
||||
base_network_details
|
||||
.contracts
|
||||
.network_monitors_contract_address =
|
||||
Some(network_monitors_contract_address.to_string());
|
||||
}
|
||||
|
||||
let nyxd_config = nyxd::Config::try_from_nym_network_details(&base_network_details)?;
|
||||
let client_config =
|
||||
client::Config::new(nyxd_endpoint, self.nym_api_endpoint.clone(), nyxd_config);
|
||||
|
||||
info!("using the following config: {client_config:#?}");
|
||||
Ok(client_config)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,336 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::http::api::{build_router, run_http_server};
|
||||
use crate::http::state::{AppState, KnownAgents};
|
||||
use crate::orchestrator::config::Config;
|
||||
use crate::orchestrator::node_refresher::NodeRefresher;
|
||||
use crate::orchestrator::result_submitter::ResultSubmitter;
|
||||
use crate::orchestrator::stale_results_eviction::StaleResultsEviction;
|
||||
use crate::storage::NetworkMonitorStorage;
|
||||
use anyhow::{Context, bail};
|
||||
use nym_crypto::asymmetric::ed25519;
|
||||
use nym_task::ShutdownManager;
|
||||
use nym_validator_client::DirectSigningHttpRpcValidatorClient;
|
||||
use nym_validator_client::client::NymApiClientExt;
|
||||
use nym_validator_client::nyxd::contract_traits::{
|
||||
NetworkMonitorsQueryClient, NetworkMonitorsSigningClient, PagedNetworkMonitorsQueryClient,
|
||||
};
|
||||
use nym_validator_client::nyxd::{AccountId, CosmWasmClient, bip39};
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use tokio::sync::RwLock;
|
||||
use tokio::time::sleep;
|
||||
use tracing::{info, warn};
|
||||
use zeroize::Zeroizing;
|
||||
|
||||
pub(crate) mod config;
|
||||
mod node_refresher;
|
||||
pub(crate) mod prometheus;
|
||||
mod result_submitter;
|
||||
mod stale_results_eviction;
|
||||
pub(crate) mod testruns;
|
||||
|
||||
pub(crate) struct NetworkMonitorOrchestrator {
|
||||
/// Runtime configuration for the orchestrator.
|
||||
pub(crate) config: Config,
|
||||
|
||||
/// Validator client used to:
|
||||
/// - submit test results to the nym-api
|
||||
/// - query node information from the chain
|
||||
/// - send authorisation transactions to the network monitors contract
|
||||
pub(crate) client: Arc<RwLock<DirectSigningHttpRpcValidatorClient>>,
|
||||
|
||||
/// Ed25519 key pair used to sign result submissions to the nym-api.
|
||||
pub(crate) identity_keys: Arc<ed25519::KeyPair>,
|
||||
|
||||
/// Bearer token presented by agents when requesting work assignments and submitting results.
|
||||
pub(crate) agents_http_auth_token: Arc<Zeroizing<String>>,
|
||||
|
||||
/// Bearer token required when attempting to access the metrics or results endpoints.
|
||||
pub(crate) metrics_and_results_http_auth_token: Arc<Zeroizing<String>>,
|
||||
|
||||
/// Handle to the local SQLite database used to track nodes and test runs.
|
||||
pub(crate) storage: NetworkMonitorStorage,
|
||||
|
||||
/// Manages graceful shutdown signalling across all orchestrator tasks.
|
||||
pub(crate) shutdown_manager: ShutdownManager,
|
||||
}
|
||||
|
||||
impl NetworkMonitorOrchestrator {
|
||||
/// Initialises the orchestrator: connects to the database, builds the validator client,
|
||||
/// and verifies that the orchestrator is authorised on both the chain and the nym-api.
|
||||
pub(crate) async fn new(
|
||||
config: Config,
|
||||
identity_keys: Arc<ed25519::KeyPair>,
|
||||
agents_http_auth_token: Zeroizing<String>,
|
||||
metrics_and_results_http_auth_token: Zeroizing<String>,
|
||||
mnemonic: bip39::Mnemonic,
|
||||
) -> anyhow::Result<Self> {
|
||||
let storage = NetworkMonitorStorage::init(&config.database_path).await?;
|
||||
|
||||
let client_config = config.try_build_validator_client_config()?;
|
||||
let client = Arc::new(RwLock::new(
|
||||
DirectSigningHttpRpcValidatorClient::new_signing(client_config, mnemonic)?,
|
||||
));
|
||||
|
||||
let this = NetworkMonitorOrchestrator {
|
||||
config,
|
||||
client,
|
||||
identity_keys,
|
||||
agents_http_auth_token: Arc::new(agents_http_auth_token),
|
||||
metrics_and_results_http_auth_token: Arc::new(metrics_and_results_http_auth_token),
|
||||
storage,
|
||||
shutdown_manager: ShutdownManager::build_new_default()?,
|
||||
};
|
||||
this.verify_on_chain_balance().await?;
|
||||
|
||||
let announced_identity_key = this.verify_orchestrator_chain_authorisation().await?;
|
||||
this.reconcile_announced_identity_key(announced_identity_key)
|
||||
.await?;
|
||||
this.verify_orchestrator_nym_api_authorisation().await?;
|
||||
|
||||
Ok(this)
|
||||
}
|
||||
|
||||
/// Returns the on-chain bech32 address of the orchestrator's signing account.
|
||||
async fn address(&self) -> AccountId {
|
||||
self.client.read().await.nyxd.address()
|
||||
}
|
||||
|
||||
/// Ensure the orchestrator has sufficient balance for transaction fees
|
||||
async fn verify_on_chain_balance(&self) -> anyhow::Result<()> {
|
||||
let address = self.address().await;
|
||||
let Some(balance) = self
|
||||
.client
|
||||
.read()
|
||||
.await
|
||||
.nyxd
|
||||
.get_balance(&address, "unym".to_string())
|
||||
.await?
|
||||
else {
|
||||
bail!("the orchestrator does not hold any unym balance");
|
||||
};
|
||||
if balance.amount < 1_000_000 {
|
||||
bail!(
|
||||
"the orchestrator does not hold sufficient amount of tokens. its current balance is {balance}"
|
||||
)
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Verifies that the orchestrator's account is authorised to send transactions to the network
|
||||
/// monitors contract (i.e. to authorise agents on-chain) and returns the identity key it has
|
||||
/// previously announced on-chain, if any.
|
||||
///
|
||||
/// Retries both on query failures (RPC flakiness) and on successful queries that don't list
|
||||
/// this orchestrator - the latter happens routinely when the admin has scheduled an
|
||||
/// authorisation transaction that hasn't landed yet, so giving it a bounded window to appear
|
||||
/// avoids crash-looping the process in that race. The total budget is
|
||||
/// `chain_authorisation_check_max_attempts` attempts spaced by
|
||||
/// `chain_authorisation_check_retry_delay`; once exhausted the function returns an error and
|
||||
/// `new()` aborts before any background tasks are spawned.
|
||||
async fn verify_orchestrator_chain_authorisation(&self) -> anyhow::Result<Option<String>> {
|
||||
let query_client = self.client.read().await.nyxd.clone_query_client();
|
||||
let address = self.address().await;
|
||||
let max_attempts = self.config.chain_authorisation_check_max_attempts.get();
|
||||
let retry_delay = self.config.chain_authorisation_check_retry_delay;
|
||||
|
||||
for attempt in 1..=max_attempts {
|
||||
match query_client.get_network_monitor_orchestrators().await {
|
||||
Ok(res) => {
|
||||
if let Some(entry) = res
|
||||
.authorised
|
||||
.into_iter()
|
||||
.find(|o| o.address.as_str() == address.as_ref())
|
||||
{
|
||||
info!(
|
||||
"orchestrator {address} is authorised in the network monitors contract"
|
||||
);
|
||||
return Ok(entry.identity_key);
|
||||
}
|
||||
warn!(
|
||||
attempt,
|
||||
max_attempts,
|
||||
"orchestrator {address} is not (yet) listed in the network monitors contract"
|
||||
);
|
||||
}
|
||||
Err(err) => {
|
||||
warn!(
|
||||
attempt,
|
||||
max_attempts,
|
||||
"failed to query network monitors contract for orchestrator authorisation: {err}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
if attempt < max_attempts {
|
||||
sleep(retry_delay).await;
|
||||
}
|
||||
}
|
||||
|
||||
Err(anyhow::anyhow!(
|
||||
"orchestrator {address} failed to confirm its authorisation in the network monitors contract after {max_attempts} attempts"
|
||||
))
|
||||
}
|
||||
|
||||
/// Ensures the identity key announced on-chain matches the key the orchestrator is running
|
||||
/// with. If the on-chain key is missing or stale, an update transaction is submitted so that
|
||||
/// agents and the nym-api can verify signatures against the correct key.
|
||||
async fn reconcile_announced_identity_key(
|
||||
&self,
|
||||
announced: Option<String>,
|
||||
) -> anyhow::Result<()> {
|
||||
let current = self.identity_keys.public_key().to_base58_string();
|
||||
|
||||
if announced.as_deref() == Some(current.as_str()) {
|
||||
info!("on-chain announced identity key matches the local identity key");
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
match &announced {
|
||||
Some(stale) => info!(
|
||||
"on-chain announced identity key ({stale}) does not match the local identity key ({current}); submitting an update"
|
||||
),
|
||||
None => info!(
|
||||
"no identity key currently announced on-chain for this orchestrator; submitting the local one ({current})"
|
||||
),
|
||||
}
|
||||
|
||||
self.client
|
||||
.write()
|
||||
.await
|
||||
.nyxd
|
||||
.update_orchestrator_identity_key(current, None)
|
||||
.await
|
||||
.context(
|
||||
"failed to announce the orchestrator identity key to the network monitors contract",
|
||||
)?;
|
||||
|
||||
info!("waiting for the key information to propagate");
|
||||
sleep(Duration::from_secs(30)).await;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Verifies that the orchestrator's identity key is authorised to submit
|
||||
/// test results to the nym-api.
|
||||
async fn verify_orchestrator_nym_api_authorisation(&self) -> anyhow::Result<()> {
|
||||
// ensure our key is authorised to submit test results to the nym-api
|
||||
let auth_result = self
|
||||
.client
|
||||
.read()
|
||||
.await
|
||||
.nym_api
|
||||
.get_known_network_monitor(&self.identity_keys.public_key().to_base58_string())
|
||||
.await?;
|
||||
if !auth_result.authorised {
|
||||
bail!(
|
||||
"orchestrator identity key is not authorised to submit test results to the nym-api"
|
||||
);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Starts all orchestrator background tasks (HTTP server, node refresher, etc.)
|
||||
/// and blocks until a shutdown signal is received.
|
||||
pub(crate) async fn run(&mut self) -> anyhow::Result<()> {
|
||||
// this shouldn't fail as we have no tasks using this client yet
|
||||
let query_client = self
|
||||
.client
|
||||
.try_read()
|
||||
.context("failed to acquire read lock on client")?
|
||||
.nyxd
|
||||
.clone_query_client();
|
||||
|
||||
// 1. build the shared state
|
||||
// 1.1. retrieve all registered agents (by this orchestrator) from the contract
|
||||
// (we assume the orchestrator has restarted and the agents are still out there as authorised)
|
||||
let address = self.address().await;
|
||||
let agents = query_client
|
||||
.get_all_network_monitor_agents()
|
||||
.await?
|
||||
.into_iter()
|
||||
.filter(|a| a.authorised_by.as_str() == address.as_ref())
|
||||
.collect::<Vec<_>>();
|
||||
let agents_state = KnownAgents::try_from(agents)?;
|
||||
let app_state = AppState::new(
|
||||
agents_state,
|
||||
self.storage.clone(),
|
||||
self.config.test_interval,
|
||||
self.client.clone(),
|
||||
);
|
||||
|
||||
// 2. build node information refresher
|
||||
let node_refresher = NodeRefresher::new(
|
||||
&self.config,
|
||||
query_client,
|
||||
self.storage.clone(),
|
||||
self.shutdown_manager.clone_shutdown_token(),
|
||||
);
|
||||
|
||||
// 3. build the http server
|
||||
let http_router = build_router(
|
||||
app_state,
|
||||
self.agents_http_auth_token.clone(),
|
||||
self.metrics_and_results_http_auth_token.clone(),
|
||||
);
|
||||
|
||||
// 4. build task for evicting stale test run results
|
||||
let stale_results_eviction = StaleResultsEviction::new(
|
||||
self.storage.clone(),
|
||||
self.config.testrun_eviction_age,
|
||||
self.config.test_timeout,
|
||||
self.shutdown_manager.clone_shutdown_token(),
|
||||
);
|
||||
|
||||
// 5. build task for submitting accumulated results to the nym-api
|
||||
let result_submitter = ResultSubmitter::new(
|
||||
&self.config,
|
||||
self.client.read().await.nym_api.clone(),
|
||||
self.storage.clone(),
|
||||
self.identity_keys.clone(),
|
||||
self.shutdown_manager.clone_shutdown_token(),
|
||||
);
|
||||
|
||||
// 6. evict stale data before starting anything else so any test runs
|
||||
// left "in progress" by a prior crashed/restarted orchestrator are
|
||||
// freed up before agents start polling for work. Note: this is a
|
||||
// blocking call — a hung DB at start-up will prevent the
|
||||
// orchestrator from serving, which is the desired fail-fast here.
|
||||
stale_results_eviction
|
||||
.evict_stale_results()
|
||||
.await
|
||||
.context("failed to evict stale data")?;
|
||||
|
||||
// 7. start all the tasks
|
||||
// http server
|
||||
let http_server_fut = run_http_server(
|
||||
http_router,
|
||||
self.config.http_server_bind_address,
|
||||
self.shutdown_manager.clone_shutdown_token(),
|
||||
);
|
||||
self.shutdown_manager
|
||||
.try_spawn_named(http_server_fut, "http-server");
|
||||
// node refresher
|
||||
self.shutdown_manager.try_spawn_named(
|
||||
async move {
|
||||
node_refresher.run().await;
|
||||
},
|
||||
"node-refresher",
|
||||
);
|
||||
// stale results eviction
|
||||
self.shutdown_manager.try_spawn_named(
|
||||
async move { stale_results_eviction.run().await },
|
||||
"stale-data-eviction",
|
||||
);
|
||||
// nym-api result submitter
|
||||
self.shutdown_manager.try_spawn_named(
|
||||
async move { result_submitter.run().await },
|
||||
"result-submitter",
|
||||
);
|
||||
|
||||
self.shutdown_manager.run_until_shutdown().await;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
+243
@@ -0,0 +1,243 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::orchestrator::config::Config;
|
||||
use crate::orchestrator::prometheus::{PROMETHEUS_METRICS, PrometheusMetric};
|
||||
use crate::storage::NetworkMonitorStorage;
|
||||
use crate::storage::models::{NewNymNode, NodeType};
|
||||
use anyhow::Context;
|
||||
use futures::{StreamExt, stream};
|
||||
use nym_bin_common::bin_info;
|
||||
use nym_crypto::asymmetric::x25519;
|
||||
use nym_network_defaults::DEFAULT_MIX_LISTENING_PORT;
|
||||
use nym_node_requests::api::client::NymNodeApiClientExt;
|
||||
use nym_node_requests::api::helpers::NymNodeApiClientRetriever;
|
||||
use nym_node_requests::api::v1::node::models::NodeRoles;
|
||||
use nym_task::ShutdownToken;
|
||||
use nym_validator_client::QueryHttpRpcNyxdClient;
|
||||
use nym_validator_client::models::KeyRotationId;
|
||||
use nym_validator_client::nyxd::contract_traits::PagedMixnetQueryClient;
|
||||
use nym_validator_client::nyxd::nym_mixnet_contract_common::NymNodeBond;
|
||||
use rand::prelude::SliceRandom;
|
||||
use std::collections::HashMap;
|
||||
use std::net::SocketAddr;
|
||||
use std::time::Duration;
|
||||
use tokio::time::{Instant, interval};
|
||||
use tracing::{error, info, warn};
|
||||
|
||||
pub(crate) struct NodeRefresher {
|
||||
pub(crate) client: QueryHttpRpcNyxdClient,
|
||||
|
||||
pub(crate) storage: NetworkMonitorStorage,
|
||||
|
||||
/// How often the list of bonded nym-nodes is refreshed from the mixnet contract
|
||||
/// (e.g. `10m`, `1h`).
|
||||
pub(crate) node_refresh_rate: Duration,
|
||||
|
||||
/// Timeout for querying a single node for its detailed information (sphinx key, noise key,
|
||||
/// etc.). Queries that exceed this budget leave the corresponding fields as `NULL`
|
||||
/// (e.g. `10s`).
|
||||
pub(crate) node_info_query_timeout: Duration,
|
||||
|
||||
/// Maximum number of nodes queried concurrently during a node refresh cycle.
|
||||
pub(crate) number_of_concurrent_node_queries: usize,
|
||||
|
||||
pub(crate) shutdown_token: ShutdownToken,
|
||||
}
|
||||
|
||||
/// Information about the node retrieved from the node directly
|
||||
struct SelfDescribedData {
|
||||
/// Mixnet socket address (host:port) at which the node accepts sphinx packets.
|
||||
mixnet_socket_address: SocketAddr,
|
||||
|
||||
/// X25519 public key used for Noise handshakes
|
||||
noise_key: x25519::PublicKey,
|
||||
|
||||
/// Sphinx public key used for packet encryption
|
||||
sphinx_key: x25519::PublicKey,
|
||||
|
||||
/// Key rotation epoch ID that `sphinx_key` belongs to.
|
||||
key_rotation_id: KeyRotationId,
|
||||
|
||||
/// The supported roles of the node in the network.
|
||||
roles: NodeRoles,
|
||||
}
|
||||
|
||||
impl NodeRefresher {
|
||||
pub(crate) fn new(
|
||||
config: &Config,
|
||||
client: QueryHttpRpcNyxdClient,
|
||||
storage: NetworkMonitorStorage,
|
||||
shutdown_token: ShutdownToken,
|
||||
) -> Self {
|
||||
NodeRefresher {
|
||||
client,
|
||||
storage,
|
||||
node_refresh_rate: config.node_refresh_rate,
|
||||
node_info_query_timeout: config.node_info_query_timeout,
|
||||
number_of_concurrent_node_queries: config.number_of_concurrent_node_queries,
|
||||
shutdown_token,
|
||||
}
|
||||
}
|
||||
async fn get_node_details_inner(&self, bond: NymNodeBond) -> anyhow::Result<SelfDescribedData> {
|
||||
let node_id = bond.node_id;
|
||||
|
||||
let client = NymNodeApiClientRetriever::new(bin_info!())
|
||||
.with_expected_identity(Some(bond.node.identity_key))
|
||||
.with_verify_host_information()
|
||||
.with_custom_port(bond.node.custom_http_port)
|
||||
.get_client(&bond.node.host, node_id)
|
||||
.await?;
|
||||
|
||||
let api_client = client.client;
|
||||
let host_info = client
|
||||
.host_information
|
||||
.context("failed to query node host information")?;
|
||||
|
||||
// retrieve information on the announced ports in case a non-custom mixnet port
|
||||
// is being used
|
||||
let aux = api_client.get_auxiliary_details().await?;
|
||||
|
||||
// if the noise key is missing, it means the node is outdated,
|
||||
// so it does not support stress testing anyway
|
||||
let noise_key = host_info
|
||||
.keys
|
||||
.x25519_versioned_noise
|
||||
.context("missing noise key")?
|
||||
.x25519_pubkey;
|
||||
let sphinx_key = host_info.keys.primary_x25519_sphinx_key.public_key;
|
||||
let key_rotation_id = host_info.keys.primary_x25519_sphinx_key.rotation_id;
|
||||
|
||||
// pseudorandomly choose which ip address to use - each announced address should work!
|
||||
let ip_address = host_info
|
||||
.ip_address
|
||||
.choose(&mut rand::thread_rng())
|
||||
.context("node hasn't announced any IPs")?;
|
||||
let mix_port = aux
|
||||
.announce_ports
|
||||
.mix_port
|
||||
.unwrap_or(DEFAULT_MIX_LISTENING_PORT);
|
||||
|
||||
// retrieve information about the node roles so that we can classify the node
|
||||
// (we're not testing gateways yet, but we still store them for completeness)
|
||||
let roles = api_client
|
||||
.get_roles()
|
||||
.await
|
||||
.context("failed to retrieve node roles")?;
|
||||
|
||||
Ok(SelfDescribedData {
|
||||
mixnet_socket_address: SocketAddr::new(*ip_address, mix_port),
|
||||
noise_key,
|
||||
sphinx_key,
|
||||
key_rotation_id,
|
||||
roles,
|
||||
})
|
||||
}
|
||||
|
||||
async fn get_node_details(&self, bond: NymNodeBond, timeout: Duration) -> NewNymNode {
|
||||
let mut node_update = NewNymNode::from_bond(&bond);
|
||||
|
||||
let node_id = bond.node_id;
|
||||
let self_described = match tokio::time::timeout(timeout, self.get_node_details_inner(bond))
|
||||
.await
|
||||
{
|
||||
Err(_timeout) => {
|
||||
warn!(
|
||||
"timed out while attempting to retrieve self-described node details for node {node_id}"
|
||||
);
|
||||
return node_update;
|
||||
}
|
||||
Ok(Err(err)) => {
|
||||
error!("failed to retrieve self-described node details for node {node_id}: {err}");
|
||||
return node_update;
|
||||
}
|
||||
Ok(Ok(info)) => info,
|
||||
};
|
||||
|
||||
node_update.mixnet_socket_address = Some(self_described.mixnet_socket_address.to_string());
|
||||
node_update.noise_key = Some(self_described.noise_key.to_base58_string());
|
||||
node_update.sphinx_key = Some(self_described.sphinx_key.to_base58_string());
|
||||
node_update.key_rotation_id = Some(self_described.key_rotation_id as i64);
|
||||
node_update.node_type = NodeType::from_roles(&self_described.roles);
|
||||
|
||||
node_update
|
||||
}
|
||||
|
||||
async fn refresh_bonded_nodes(&self) -> anyhow::Result<()> {
|
||||
let start = Instant::now();
|
||||
|
||||
// 1. retrieve all nodes from the contract
|
||||
let nodes = self.client.get_all_nymnode_bonds().await?;
|
||||
let num_nodes = nodes.len();
|
||||
info!("retrieved {num_nodes} bonded nodes from the contract");
|
||||
|
||||
// 2. retrieve detailed information from the self-described endpoints
|
||||
let timeout = self.node_info_query_timeout;
|
||||
let refreshed_nodes: Vec<_> = stream::iter(nodes)
|
||||
.map(|b| self.get_node_details(b, timeout))
|
||||
.buffer_unordered(self.number_of_concurrent_node_queries)
|
||||
.collect()
|
||||
.await;
|
||||
|
||||
let mut per_type: HashMap<NodeType, i64> = HashMap::new();
|
||||
for node in &refreshed_nodes {
|
||||
*per_type.entry(node.node_type).or_insert(0) += 1;
|
||||
}
|
||||
let count_of = |t: NodeType| per_type.get(&t).copied().unwrap_or(0);
|
||||
let unknown = count_of(NodeType::Unknown);
|
||||
let successful = (refreshed_nodes.len() as i64) - unknown;
|
||||
info!("managed to retrieve full node information on {successful} nodes ({unknown} failed)");
|
||||
|
||||
PROMETHEUS_METRICS.set(
|
||||
PrometheusMetric::BondedMixnodeNymNodes,
|
||||
count_of(NodeType::Mixnode),
|
||||
);
|
||||
PROMETHEUS_METRICS.set(
|
||||
PrometheusMetric::BondedGatewayNymNodes,
|
||||
count_of(NodeType::Gateway),
|
||||
);
|
||||
PROMETHEUS_METRICS.set(
|
||||
PrometheusMetric::BondedMixnodeAndGatewayNymNodes,
|
||||
count_of(NodeType::MixnodeAndGateway),
|
||||
);
|
||||
PROMETHEUS_METRICS.set(PrometheusMetric::BondedUnknownNymNodes, unknown);
|
||||
PROMETHEUS_METRICS.set(PrometheusMetric::SuccessfulNymNodeDataRetrieval, successful);
|
||||
PROMETHEUS_METRICS.set(PrometheusMetric::FailedNymNodeDataRetrieval, unknown);
|
||||
|
||||
// 3. persist every node (including unreachable ones so we keep their
|
||||
// previously-learned keys around for the next refresh). The testrun
|
||||
// assignment query filters out non-mixnode / unknown entries.
|
||||
self.storage
|
||||
.batch_insert_or_update_nym_nodes(&refreshed_nodes)
|
||||
.await?;
|
||||
|
||||
// Observe the cycle duration last so it reflects the full refresh path
|
||||
// (contract query + per-node queries + storage write).
|
||||
PROMETHEUS_METRICS.observe_histogram(
|
||||
PrometheusMetric::NodeRefreshCycleSeconds,
|
||||
start.elapsed().as_secs_f64(),
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub(crate) async fn run(&self) {
|
||||
let mut interval = interval(self.node_refresh_rate);
|
||||
interval.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Delay);
|
||||
|
||||
loop {
|
||||
tokio::select! {
|
||||
biased;
|
||||
_ = self.shutdown_token.cancelled() => {
|
||||
break
|
||||
}
|
||||
_ = interval.tick() => {
|
||||
if let Err(err) = self.refresh_bonded_nodes().await {
|
||||
error!("failed to refresh bonded nodes: {err}");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
info!("node refresher stopped");
|
||||
}
|
||||
}
|
||||
+437
@@ -0,0 +1,437 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
//! Prometheus metrics exposed by the orchestrator.
|
||||
//!
|
||||
//! Every series this service emits is declared up-front as a variant of [`PrometheusMetric`].
|
||||
//! Each variant carries its Prometheus help string via strum's `EnumProperty` attribute and is
|
||||
//! mapped to a concrete counter / gauge / histogram in [`PrometheusMetric::to_registrable_metric`].
|
||||
//! Call sites emit values through the process-wide [`PROMETHEUS_METRICS`] handle, which forwards
|
||||
//! to the underlying [`nym_metrics`] registry.
|
||||
//!
|
||||
//! The registry is pre-populated in [`NetworkMonitorPrometheusMetrics::initialise`] so that every
|
||||
//! series is present (with a zero value) from the very first scrape — this avoids dashboards and
|
||||
//! alerts interpreting the first observation as a reset.
|
||||
|
||||
use nym_metrics::{HistogramTimer, Metric, metrics_registry};
|
||||
use std::sync::LazyLock;
|
||||
use strum::{Display, EnumCount, EnumIter, EnumProperty, IntoEnumIterator};
|
||||
|
||||
/// Process-wide handle to the orchestrator's Prometheus metrics. Lazily initialised on first
|
||||
/// access; the initialisation pre-registers every [`PrometheusMetric`] variant so that scrapes
|
||||
/// observe a complete set of zeroed series even before any event has fired.
|
||||
pub static PROMETHEUS_METRICS: LazyLock<NetworkMonitorPrometheusMetrics> =
|
||||
LazyLock::new(NetworkMonitorPrometheusMetrics::initialise);
|
||||
|
||||
/// Histogram buckets (upper bounds, in seconds) for [`PrometheusMetric::TestDurationSeconds`].
|
||||
/// Densely spaced in the 40–60 s range because most completed runs cluster there and small
|
||||
/// shifts in that band are the most interesting signal.
|
||||
const TESTRUN_DURATION: &[f64] = &[
|
||||
// sub 5s (implicitly)
|
||||
5., // 5s - 15s
|
||||
15., // 15s - 30s
|
||||
30., // 30s - 40s
|
||||
40., // 40s - 45s
|
||||
45., // 45s - 50s
|
||||
50., // 50s - 55s
|
||||
55., // 55s - 60s
|
||||
60., // 1min - 2min
|
||||
120., // 2min - 5min
|
||||
300., // 5min+ (implicitly)
|
||||
];
|
||||
|
||||
/// Histogram buckets (upper bounds, in milliseconds) for
|
||||
/// [`PrometheusMetric::ApproximateNodeLatencyMs`]. Log-ish spacing from 1 ms up to 1 s — typical
|
||||
/// mixnet latencies are well under 500 ms and anything past 1 s lands in the overflow bucket.
|
||||
const NODE_LATENCY: &[f64] = &[
|
||||
// sub 1ms (implicitly)
|
||||
1., // 1ms - 5ms
|
||||
5., // 5ms - 10ms
|
||||
10., // 10ms - 20ms
|
||||
20., // 20ms - 50ms
|
||||
50., // 50ms - 100ms
|
||||
100., // 100ms - 200ms
|
||||
200., // 200ms - 500ms
|
||||
500., // 500ms - 1s
|
||||
1000., // 1s+ (implicitly)
|
||||
];
|
||||
|
||||
/// Histogram buckets for [`PrometheusMetric::TestrunReceivedPacketsRatio`] — `received / sent`,
|
||||
/// so values live in `[0, 1]`. The dedicated `<= 0.0` bucket isolates the "got nothing" case from
|
||||
/// "got a few", which otherwise would all collapse into a single low bucket; upper buckets are
|
||||
/// dense near 1.0 because the difference between 99% and 95% delivery is operationally significant.
|
||||
const RECEIVED_PACKETS_RATIO: &[f64] = &[
|
||||
0., // 0 - 0.1
|
||||
0.1, // 0.1 - 0.2
|
||||
0.2, // 0.2 - 0.3
|
||||
0.3, // 0.3 - 0.4
|
||||
0.4, // 0.4 - 0.5
|
||||
0.5, // 0.5 - 0.6
|
||||
0.6, // 0.6 - 0.7
|
||||
0.7, // 0.7 - 0.8
|
||||
0.8, // 0.8 - 0.9
|
||||
0.9, // 0.9 - 0.95
|
||||
0.95, // 0.95 - 0.98
|
||||
0.98, // 0.98 - 0.99
|
||||
0.99, // 0.99+ (implicitly)
|
||||
];
|
||||
|
||||
/// Histogram buckets (upper bounds, in seconds) for
|
||||
/// [`PrometheusMetric::NodeRefreshCycleSeconds`]. Shape targets the expected range of a single
|
||||
/// refresh sweep: under a minute when everything's healthy, up to ~10 min in degraded cases
|
||||
/// (large topology × per-node timeouts × limited concurrency).
|
||||
const NODE_REFRESH_CYCLE: &[f64] = &[
|
||||
// sub 1s (implicitly)
|
||||
1., // 1s - 5s
|
||||
5., // 5s - 10s
|
||||
10., // 10s - 30s
|
||||
30., // 30s - 60s
|
||||
60., // 1min - 2min
|
||||
120., // 2min - 5min
|
||||
300., // 5min - 10min
|
||||
600., // 10min+ (implicitly)
|
||||
];
|
||||
|
||||
/// Histogram buckets (upper bounds, in milliseconds) for
|
||||
/// [`PrometheusMetric::AverageTestPacketRTTMs`]. Same shape as [`NODE_LATENCY`] — this is the
|
||||
/// mean per-packet round trip over a single testrun, not the approximation used for node latency.
|
||||
const AVG_PACKET_RTT: &[f64] = &[
|
||||
// sub 1ms (implicitly)
|
||||
1., // 1ms - 5ms
|
||||
5., // 5ms - 10ms
|
||||
10., // 10ms - 20ms
|
||||
20., // 20ms - 50ms
|
||||
50., // 50ms - 100ms
|
||||
100., // 100ms - 200ms
|
||||
200., // 200ms - 500ms
|
||||
500., // 500ms - 1s
|
||||
1000., // 1s+ (implicitly)
|
||||
];
|
||||
|
||||
/// Every Prometheus series emitted by the orchestrator. Each variant maps to exactly one metric
|
||||
/// and must carry a `help` strum property — this is verified by the `every_variant_has_help_property`
|
||||
/// test.
|
||||
///
|
||||
/// The Prometheus metric name is derived from the variant name via strum: `serialize_all =
|
||||
/// "snake_case"` + the `nym_network_monitor_` prefix. So `MixPortRequests` becomes
|
||||
/// `nym_network_monitor_mix_port_requests`. The concrete metric kind (counter / gauge / histogram,
|
||||
/// plus bucket bounds) is chosen in [`PrometheusMetric::to_registrable_metric`].
|
||||
#[derive(Clone, Debug, EnumIter, Display, EnumProperty, EnumCount, Eq, Hash, PartialEq)]
|
||||
#[strum(serialize_all = "snake_case", prefix = "nym_network_monitor_")]
|
||||
pub enum PrometheusMetric {
|
||||
#[strum(props(
|
||||
help = "The number of requests to announce an agent to the network monitors contract"
|
||||
))]
|
||||
AgentAnnounceRequests,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of duplicate requests to announce an agent to the network monitors contract (agent has already been announced before)"
|
||||
))]
|
||||
AgentDuplicateAnnouncementRequests,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of successful announcements of an agent to the network monitors contract"
|
||||
))]
|
||||
AgentContractAnnounceSuccesses,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of failed announcements of an agent to the network monitors contract"
|
||||
))]
|
||||
AgentContractAnnounceFailures,
|
||||
|
||||
#[strum(props(help = "The number of requests to assign a test run to an agent"))]
|
||||
AgentTestrunRequests,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of requests to assign a test run to an agent that was not known to the orchestrator"
|
||||
))]
|
||||
AgentUnknownAgentTestrunRequests,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of requests to assign a test run to an agent that was not announced to the network monitors contract"
|
||||
))]
|
||||
AgentTestrunRequestsWithoutAnnouncement,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of testrun requests that resulted in no work being assigned"
|
||||
))]
|
||||
EmptyTestrunAssignments,
|
||||
|
||||
#[strum(props(help = "The number of testrun requests that resulted in work being assigned"))]
|
||||
NonEmptyTestrunAssignments,
|
||||
|
||||
#[strum(props(help = "The number of testrun results that were submitted by agents"))]
|
||||
TestRunResultSubmissions,
|
||||
|
||||
#[strum(props(help = "The number of stale testruns that were evicted from the storage"))]
|
||||
StaleTestrunsEvicted,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of testruns in progress that timed out and were evicted from the queue and the storage"
|
||||
))]
|
||||
TimedOutTestrunsEvicted,
|
||||
|
||||
#[strum(props(help = "The duration of a test run, in seconds"))]
|
||||
TestDurationSeconds,
|
||||
|
||||
#[strum(props(help = "The number of testruns that resulted in errors"))]
|
||||
TestrunsErrors,
|
||||
|
||||
#[strum(props(help = "The approximate latency to a node in milliseconds"))]
|
||||
ApproximateNodeLatencyMs,
|
||||
|
||||
#[strum(props(
|
||||
help = "Ratio of packets sent to packets received in a testrun (received / sent)"
|
||||
))]
|
||||
TestrunReceivedPacketsRatio,
|
||||
|
||||
#[strum(props(
|
||||
help = "The average time it took to receive a test packet back from a node under test"
|
||||
))]
|
||||
AverageTestPacketRTTMs,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of bonded nodes classified as mixnode-only from their self-described roles"
|
||||
))]
|
||||
BondedMixnodeNymNodes,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of bonded nodes classified as gateway-only from their self-described roles"
|
||||
))]
|
||||
BondedGatewayNymNodes,
|
||||
|
||||
#[strum(props(help = "The number of bonded nodes advertising both mixnode and gateway roles"))]
|
||||
BondedMixnodeAndGatewayNymNodes,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of bonded nodes whose self-described role could not be determined (unreachable or no roles reported)"
|
||||
))]
|
||||
BondedUnknownNymNodes,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of successful Nym node data retrievals from self-described endpoints"
|
||||
))]
|
||||
SuccessfulNymNodeDataRetrieval,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of failed Nym node data retrievals from self-described endpoints"
|
||||
))]
|
||||
FailedNymNodeDataRetrieval,
|
||||
|
||||
#[strum(props(help = "The duration of a full bonded-node refresh cycle, in seconds"))]
|
||||
NodeRefreshCycleSeconds,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of test runs currently in progress (rows in testrun_in_progress)"
|
||||
))]
|
||||
TestrunsInProgress,
|
||||
|
||||
#[strum(props(help = "The total number of agents known to the orchestrator"))]
|
||||
KnownAgentsTotal,
|
||||
|
||||
#[strum(props(
|
||||
help = "The number of known agents that have been announced to the network monitors contract"
|
||||
))]
|
||||
KnownAgentsAnnounced,
|
||||
|
||||
#[strum(props(
|
||||
help = "The total number of test packets dispatched across all submitted testruns"
|
||||
))]
|
||||
TestPacketsSent,
|
||||
|
||||
#[strum(props(
|
||||
help = "The total number of test packets received back across all submitted testruns"
|
||||
))]
|
||||
TestPacketsReceived,
|
||||
}
|
||||
|
||||
impl PrometheusMetric {
|
||||
fn name(&self) -> String {
|
||||
self.to_string()
|
||||
}
|
||||
|
||||
fn help(&self) -> &'static str {
|
||||
// SAFETY: every variant has a `help` prop defined (and there's a unit test is checking for that)
|
||||
#[allow(clippy::unwrap_used)]
|
||||
self.get_str("help").unwrap()
|
||||
}
|
||||
|
||||
/// Builds the concrete [`Metric`] this variant should register as (counter / gauge / histogram
|
||||
/// with the right bucket bounds). Called from [`NetworkMonitorPrometheusMetrics::initialise`]
|
||||
/// to pre-populate the registry, and from the `set` / `observe_histogram` fallback paths to
|
||||
/// lazily register a metric that somehow wasn't set up yet.
|
||||
fn to_registrable_metric(&self) -> Option<Metric> {
|
||||
let name = self.name();
|
||||
let help = self.help();
|
||||
|
||||
match self {
|
||||
PrometheusMetric::AgentAnnounceRequests => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::AgentDuplicateAnnouncementRequests => {
|
||||
Metric::new_int_counter(&name, help)
|
||||
}
|
||||
PrometheusMetric::AgentContractAnnounceSuccesses => {
|
||||
Metric::new_int_counter(&name, help)
|
||||
}
|
||||
PrometheusMetric::AgentContractAnnounceFailures => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::AgentTestrunRequests => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::AgentUnknownAgentTestrunRequests => {
|
||||
Metric::new_int_counter(&name, help)
|
||||
}
|
||||
PrometheusMetric::AgentTestrunRequestsWithoutAnnouncement => {
|
||||
Metric::new_int_counter(&name, help)
|
||||
}
|
||||
PrometheusMetric::EmptyTestrunAssignments => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::NonEmptyTestrunAssignments => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::TestRunResultSubmissions => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::StaleTestrunsEvicted => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::TimedOutTestrunsEvicted => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::TestDurationSeconds => {
|
||||
Metric::new_histogram(&name, help, Some(TESTRUN_DURATION))
|
||||
}
|
||||
PrometheusMetric::TestrunsErrors => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::ApproximateNodeLatencyMs => {
|
||||
Metric::new_histogram(&name, help, Some(NODE_LATENCY))
|
||||
}
|
||||
PrometheusMetric::TestrunReceivedPacketsRatio => {
|
||||
Metric::new_histogram(&name, help, Some(RECEIVED_PACKETS_RATIO))
|
||||
}
|
||||
PrometheusMetric::AverageTestPacketRTTMs => {
|
||||
Metric::new_histogram(&name, help, Some(AVG_PACKET_RTT))
|
||||
}
|
||||
PrometheusMetric::BondedMixnodeNymNodes => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::BondedGatewayNymNodes => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::BondedMixnodeAndGatewayNymNodes => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::BondedUnknownNymNodes => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::SuccessfulNymNodeDataRetrieval => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::FailedNymNodeDataRetrieval => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::NodeRefreshCycleSeconds => {
|
||||
Metric::new_histogram(&name, help, Some(NODE_REFRESH_CYCLE))
|
||||
}
|
||||
PrometheusMetric::TestrunsInProgress => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::KnownAgentsTotal => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::KnownAgentsAnnounced => Metric::new_int_gauge(&name, help),
|
||||
PrometheusMetric::TestPacketsSent => Metric::new_int_counter(&name, help),
|
||||
PrometheusMetric::TestPacketsReceived => Metric::new_int_counter(&name, help),
|
||||
}
|
||||
}
|
||||
|
||||
/// Sets the gauge to `value`. If the metric has not yet been registered (shouldn't happen after
|
||||
/// `initialise`, but we're defensive), falls back to registering it first and retrying.
|
||||
fn set(&self, value: i64) {
|
||||
let reg = metrics_registry();
|
||||
if !reg.set(&self.name(), value)
|
||||
&& let Some(registrable) = self.to_registrable_metric()
|
||||
{
|
||||
reg.register_metric(registrable);
|
||||
reg.set(&self.name(), value);
|
||||
}
|
||||
}
|
||||
|
||||
fn set_float(&self, value: f64) {
|
||||
metrics_registry().set_float(&self.name(), value);
|
||||
}
|
||||
|
||||
fn inc(&self) {
|
||||
metrics_registry().inc(&self.name());
|
||||
}
|
||||
|
||||
fn inc_by(&self, value: i64) {
|
||||
metrics_registry().inc_by(&self.name(), value);
|
||||
}
|
||||
|
||||
/// Records `value` into the histogram. Same register-on-miss fallback as [`Self::set`].
|
||||
fn observe_histogram(&self, value: f64) {
|
||||
let reg = metrics_registry();
|
||||
if !reg.add_to_histogram(&self.name(), value)
|
||||
&& let Some(registrable) = self.to_registrable_metric()
|
||||
{
|
||||
reg.register_metric(registrable);
|
||||
reg.add_to_histogram(&self.name(), value);
|
||||
}
|
||||
}
|
||||
|
||||
fn start_timer(&self) -> Option<HistogramTimer> {
|
||||
metrics_registry().start_timer(&self.name())
|
||||
}
|
||||
}
|
||||
|
||||
/// Orchestrator-side handle to the process-wide Prometheus registry. Constructed once via
|
||||
/// [`Self::initialise`] (held in the [`PROMETHEUS_METRICS`] static) and used from call sites to
|
||||
/// emit values against the [`PrometheusMetric`] enum. All mutating methods are thin wrappers
|
||||
/// around the corresponding methods on [`PrometheusMetric`].
|
||||
#[non_exhaustive]
|
||||
pub struct NetworkMonitorPrometheusMetrics {
|
||||
_private: (),
|
||||
}
|
||||
|
||||
impl NetworkMonitorPrometheusMetrics {
|
||||
/// Pre-registers every [`PrometheusMetric`] variant in the shared registry so that the very
|
||||
/// first scrape after startup already returns the full set of series with zero values.
|
||||
/// Without this, series only appear after their first observation, which can make dashboards
|
||||
/// and alerting rules misbehave (missing series vs. zeroed series are not the same signal).
|
||||
pub(crate) fn initialise() -> Self {
|
||||
let registry = metrics_registry();
|
||||
|
||||
// we can't initialise complex metrics as their names will only be fully known at runtime
|
||||
for kind in PrometheusMetric::iter() {
|
||||
if let Some(metric) = kind.to_registrable_metric() {
|
||||
registry.register_metric(metric);
|
||||
}
|
||||
}
|
||||
|
||||
NetworkMonitorPrometheusMetrics { _private: () }
|
||||
}
|
||||
|
||||
/// Renders the full registry in the Prometheus text exposition format — this is what the
|
||||
/// `/v1/metrics/prometheus` scrape endpoint returns.
|
||||
pub fn metrics(&self) -> String {
|
||||
metrics_registry().to_string()
|
||||
}
|
||||
|
||||
pub fn set(&self, metric: PrometheusMetric, value: i64) {
|
||||
metric.set(value)
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub fn set_float(&self, metric: PrometheusMetric, value: f64) {
|
||||
metric.set_float(value)
|
||||
}
|
||||
|
||||
pub fn inc(&self, metric: PrometheusMetric) {
|
||||
metric.inc()
|
||||
}
|
||||
|
||||
pub fn inc_by(&self, metric: PrometheusMetric, value: i64) {
|
||||
metric.inc_by(value)
|
||||
}
|
||||
|
||||
pub fn observe_histogram(&self, metric: PrometheusMetric, value: f64) {
|
||||
metric.observe_histogram(value)
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub fn start_timer(&self, metric: PrometheusMetric) -> Option<HistogramTimer> {
|
||||
metric.start_timer()
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use strum::IntoEnumIterator;
|
||||
|
||||
#[test]
|
||||
fn prometheus_metrics() {
|
||||
// a sanity check for anyone adding new metrics. if this test fails,
|
||||
// make sure any methods on `PrometheusMetric` enum don't need updating
|
||||
// or require custom Display impl
|
||||
assert_eq!(29, PrometheusMetric::COUNT)
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn every_variant_has_help_property() {
|
||||
for variant in PrometheusMetric::iter() {
|
||||
assert!(variant.get_str("help").is_some())
|
||||
}
|
||||
}
|
||||
}
|
||||
+141
@@ -0,0 +1,141 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::orchestrator::config::Config;
|
||||
use crate::storage::NetworkMonitorStorage;
|
||||
use anyhow::Context;
|
||||
use nym_crypto::asymmetric::ed25519;
|
||||
use nym_node_requests::api::Client;
|
||||
use nym_task::ShutdownToken;
|
||||
use nym_validator_client::models::{StressTestBatchSubmissionContent, StressTestResult};
|
||||
use nym_validator_client::nym_api::NymApiClientExt;
|
||||
use nym_validator_client::signable::SignableMessageBody;
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use tokio::time::{Instant, MissedTickBehavior, interval_at};
|
||||
use tracing::{debug, info};
|
||||
|
||||
/// Background task that periodically drains freshly-completed test run results from the local
|
||||
/// storage, wraps them into a signed [`StressTestBatchSubmission`][batch], and POSTs the batch to
|
||||
/// the nym-api.
|
||||
///
|
||||
/// Results are kept in local storage (and subject to the `testrun_eviction_age` retention window)
|
||||
/// so that a transient nym-api outage or a crashed orchestrator doesn't silently lose
|
||||
/// measurements - the next successful submission sweep will pick up anything that was missed.
|
||||
///
|
||||
/// [batch]: nym_api_requests::models::network_monitor::StressTestBatchSubmission
|
||||
pub(crate) struct ResultSubmitter {
|
||||
/// Nym-api client used to reach the api endpoint that accepts stress-test batches.
|
||||
client: Client,
|
||||
|
||||
/// Handle to the local SQLite database from which pending results are drained.
|
||||
storage: NetworkMonitorStorage,
|
||||
|
||||
/// Ed25519 key pair whose private half signs each batch submission and whose public half
|
||||
/// is the `signer` nym-api validates against the authorised-monitors set.
|
||||
identity_keys: Arc<ed25519::KeyPair>,
|
||||
|
||||
/// Cadence at which [`Self::run`] attempts a submission sweep.
|
||||
submission_interval: Duration,
|
||||
|
||||
shutdown_token: ShutdownToken,
|
||||
}
|
||||
|
||||
impl ResultSubmitter {
|
||||
pub(crate) fn new(
|
||||
config: &Config,
|
||||
client: Client,
|
||||
storage: NetworkMonitorStorage,
|
||||
identity_keys: Arc<ed25519::KeyPair>,
|
||||
shutdown_token: ShutdownToken,
|
||||
) -> Self {
|
||||
ResultSubmitter {
|
||||
client,
|
||||
storage,
|
||||
identity_keys,
|
||||
submission_interval: config.result_submission_interval,
|
||||
shutdown_token,
|
||||
}
|
||||
}
|
||||
|
||||
/// Perform a single submission sweep: read every `testrun` row produced since the last
|
||||
/// acknowledged batch, wrap them into a signed [`StressTestBatchSubmission`][batch], POST the
|
||||
/// batch to the nym-api, and - only on success - advance the `last_submitted_testrun_id`
|
||||
/// watermark.
|
||||
///
|
||||
/// No-ops silently when there is nothing new to submit.
|
||||
///
|
||||
/// The watermark is intentionally advanced **after** the POST returns `Ok`. A crash or
|
||||
/// network failure between these two steps re-sends the same rows under a fresh batch
|
||||
/// timestamp on the next sweep - harmless because nym-api's replay protection is batch-level
|
||||
/// (it rejects stale/duplicate batches, not re-seen row contents) and duplicate inserts at
|
||||
/// the row level are rare and tolerable. This bias towards at-least-once delivery is
|
||||
/// deliberate: losing measurements is worse than occasionally duplicating them.
|
||||
///
|
||||
/// [batch]: nym_api_requests::models::network_monitor::StressTestBatchSubmission
|
||||
async fn submit_pending_results(&self) -> anyhow::Result<()> {
|
||||
info!("submitting stress-test results to nym-api");
|
||||
let last_submitted = self.storage.get_last_submitted_testrun_id().await?;
|
||||
// `None` means "never submitted" - treat as 0, which pulls everything currently in the
|
||||
// table (testrun.id is AUTOINCREMENT, so always >= 1).
|
||||
let after_id = last_submitted.unwrap_or(0);
|
||||
|
||||
let pending = self.storage.get_testruns_after(after_id).await?;
|
||||
if pending.is_empty() {
|
||||
debug!("stress-test result submission sweep: no new results");
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// `get_testruns_after` returns rows ordered by id ASC, so the last row carries the
|
||||
// highest id and is what we advance the watermark to once the batch is accepted.
|
||||
#[allow(clippy::expect_used)]
|
||||
let max_id = pending.last().expect("pending is non-empty").id;
|
||||
let batch_size = pending.len();
|
||||
|
||||
let results: Vec<StressTestResult> = pending.into_iter().map(Into::into).collect();
|
||||
|
||||
let signer = *self.identity_keys.public_key();
|
||||
let body = StressTestBatchSubmissionContent::new(signer, results);
|
||||
let signed = body.sign(self.identity_keys.private_key());
|
||||
|
||||
self.client
|
||||
.submit_stress_testing_results(&signed)
|
||||
.await
|
||||
.context("failed to POST stress-test batch submission to nym-api")?;
|
||||
|
||||
self.storage.set_last_submitted_testrun_id(max_id).await?;
|
||||
info!("submitted {batch_size} stress-test results to nym-api (testrun ids up to {max_id})");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Run the submission loop until the shutdown token is cancelled.
|
||||
///
|
||||
/// The first tick is deliberately offset by `submission_interval` so the orchestrator has
|
||||
/// time to finish start-up reconciliation (chain authorisation check, etc.) before the first
|
||||
/// submission is attempted. `MissedTickBehavior::Delay` avoids burst catch-up ticks if a
|
||||
/// sweep runs long under DB or network pressure.
|
||||
pub(crate) async fn run(&self) {
|
||||
let mut interval = interval_at(
|
||||
Instant::now() + self.submission_interval,
|
||||
self.submission_interval,
|
||||
);
|
||||
interval.set_missed_tick_behavior(MissedTickBehavior::Delay);
|
||||
|
||||
loop {
|
||||
tokio::select! {
|
||||
biased;
|
||||
_ = self.shutdown_token.cancelled() => break,
|
||||
_ = interval.tick() => {
|
||||
if let Err(err) = self.submit_pending_results().await {
|
||||
// Submission errors shouldn't kill the task - local storage retains the
|
||||
// pending rows until the retention window expires, so the next tick will
|
||||
// retry and eventually catch up once the nym-api is reachable again.
|
||||
tracing::error!("failed to submit stress-test results: {err}");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
info!("stress-test result submitter stopped");
|
||||
}
|
||||
}
|
||||
+146
@@ -0,0 +1,146 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::orchestrator::prometheus::{PROMETHEUS_METRICS, PrometheusMetric};
|
||||
use crate::storage::NetworkMonitorStorage;
|
||||
use nym_task::ShutdownToken;
|
||||
use std::time::Duration;
|
||||
use tokio::time::{Instant, MissedTickBehavior, interval_at};
|
||||
use tracing::{debug, error, info};
|
||||
|
||||
/// Background task that periodically purges stale data from the storage.
|
||||
///
|
||||
/// Two distinct kinds of staleness are handled:
|
||||
/// - in-progress test runs whose assigned agent has gone silent past
|
||||
/// `test_timeout` (freed so they can be reassigned),
|
||||
/// - finalised test runs older than `testrun_eviction_age` (dropped to keep
|
||||
/// the results table bounded).
|
||||
///
|
||||
/// The two deletions are deliberately issued as separate statements rather
|
||||
/// than wrapped in a transaction: they touch disjoint tables, a partial
|
||||
/// failure is self-healing on the next tick, and keeping them independent
|
||||
/// avoids holding a write lock across both for the whole sweep.
|
||||
pub(crate) struct StaleResultsEviction {
|
||||
storage: NetworkMonitorStorage,
|
||||
|
||||
/// Age past which a finalised test run is considered stale and removed.
|
||||
/// Mirrors `Config::testrun_eviction_age`.
|
||||
testrun_eviction_age: Duration,
|
||||
|
||||
/// Maximum time a test run may remain "in progress" before we assume the
|
||||
/// assigned agent has died and free the slot for reassignment.
|
||||
/// Mirrors `Config::test_timeout`.
|
||||
test_timeout: Duration,
|
||||
|
||||
/// Cadence at which [`Self::run`] performs an eviction sweep.
|
||||
check_interval: Duration,
|
||||
|
||||
shutdown_token: ShutdownToken,
|
||||
}
|
||||
|
||||
/// Lower bound on the sweep cadence to avoid hammering the DB (or panicking
|
||||
/// `interval_at`) when either timeout is configured to an unrealistically
|
||||
/// small value.
|
||||
const MIN_CHECK_INTERVAL: Duration = Duration::from_secs(60);
|
||||
|
||||
impl StaleResultsEviction {
|
||||
pub(crate) fn new(
|
||||
storage: NetworkMonitorStorage,
|
||||
testrun_eviction_age: Duration,
|
||||
test_timeout: Duration,
|
||||
shutdown_token: ShutdownToken,
|
||||
) -> Self {
|
||||
// Sweep at least twice per shortest timeout window so the worst-case
|
||||
// lag between an item going stale and being evicted is bounded by
|
||||
// roughly 1.5x that timeout rather than 2x. Floored at
|
||||
// `MIN_CHECK_INTERVAL` to stay safe under degenerate configs.
|
||||
let check_interval = Duration::max(
|
||||
MIN_CHECK_INTERVAL,
|
||||
Duration::min(testrun_eviction_age, test_timeout) / 2,
|
||||
);
|
||||
|
||||
Self {
|
||||
storage,
|
||||
testrun_eviction_age,
|
||||
test_timeout,
|
||||
check_interval,
|
||||
shutdown_token,
|
||||
}
|
||||
}
|
||||
|
||||
/// Performs a single eviction sweep: clears timed-out in-progress test
|
||||
/// runs and deletes results older than the configured retention window.
|
||||
/// Logs how many rows were affected so ops can confirm the task is doing
|
||||
/// real work (and spot unexpected spikes).
|
||||
pub(crate) async fn evict_stale_results(&self) -> anyhow::Result<()> {
|
||||
let cleared_in_progress = self
|
||||
.storage
|
||||
.clear_timed_out_testruns_in_progress(self.test_timeout)
|
||||
.await?;
|
||||
let evicted_old = self
|
||||
.storage
|
||||
.evict_old_testruns(self.testrun_eviction_age)
|
||||
.await?;
|
||||
|
||||
if cleared_in_progress > 0 || evicted_old > 0 {
|
||||
PROMETHEUS_METRICS.inc_by(
|
||||
PrometheusMetric::TimedOutTestrunsEvicted,
|
||||
cleared_in_progress as i64,
|
||||
);
|
||||
PROMETHEUS_METRICS.inc_by(PrometheusMetric::StaleTestrunsEvicted, evicted_old as i64);
|
||||
|
||||
info!(
|
||||
cleared_in_progress,
|
||||
evicted_old, "stale data eviction sweep completed"
|
||||
);
|
||||
} else {
|
||||
debug!("stale data eviction sweep completed: nothing to evict");
|
||||
}
|
||||
|
||||
// Reconcile the in-flight gauge against the authoritative row count. The gauge is
|
||||
// primarily maintained live via inc/dec at assign/submit/timeout paths; this sweep is
|
||||
// a safety net that corrects any drift (e.g. from a future code path that forgets to
|
||||
// update the gauge) and bounds the worst-case staleness to one sweep interval.
|
||||
match self.storage.count_testruns_in_progress().await {
|
||||
Ok(count) => PROMETHEUS_METRICS.set(PrometheusMetric::TestrunsInProgress, count),
|
||||
Err(err) => error!("failed to count in-flight testruns for metric: {err}"),
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Runs the eviction loop until the shutdown token is cancelled.
|
||||
///
|
||||
/// Cancellation is cooperative: it is only observed between sweeps, so a
|
||||
/// sweep already in flight is allowed to finish. This keeps partial
|
||||
/// deletions from being left on the floor at shutdown.
|
||||
///
|
||||
/// The first tick is deliberately offset by `check_interval` because the
|
||||
/// orchestrator invokes [`Self::evict_stale_results`] once during start-up
|
||||
/// (to reap anything left behind by a prior crash or restart), so an
|
||||
/// immediate tick here would redo that work.
|
||||
///
|
||||
/// `MissedTickBehavior::Delay` prevents burst catch-up ticks when a sweep
|
||||
/// runs long under DB load — otherwise a slow sweep would queue multiple
|
||||
/// back-to-back ticks and amplify the pressure that made it slow in the
|
||||
/// first place.
|
||||
pub(crate) async fn run(&self) {
|
||||
let mut interval = interval_at(Instant::now() + self.check_interval, self.check_interval);
|
||||
interval.set_missed_tick_behavior(MissedTickBehavior::Delay);
|
||||
|
||||
loop {
|
||||
tokio::select! {
|
||||
biased;
|
||||
_ = self.shutdown_token.cancelled() => break,
|
||||
_ = interval.tick() => {
|
||||
if let Err(err) = self.evict_stale_results().await {
|
||||
// Transient storage errors shouldn't kill the task — the next
|
||||
// tick will retry and any missed items simply age a bit longer.
|
||||
error!("failed to evict stale results: {err}");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
info!("stale results eviction stopped");
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,2 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,279 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use crate::orchestrator::prometheus::{PROMETHEUS_METRICS, PrometheusMetric};
|
||||
use crate::storage::manager::StorageManager;
|
||||
use crate::storage::models::{NewNymNode, NewTestRun, NymNode, TestRun, TestRunInProgress};
|
||||
use anyhow::Context;
|
||||
use nym_network_monitor_orchestrator_requests::models::Pagination;
|
||||
use nym_validator_client::client::NodeId;
|
||||
use sqlx::ConnectOptions;
|
||||
use sqlx::sqlite::{SqliteAutoVacuum, SqliteSynchronous};
|
||||
use std::path::Path;
|
||||
use std::time::Duration;
|
||||
use time::OffsetDateTime;
|
||||
use tracing::log::{LevelFilter, debug};
|
||||
|
||||
mod manager;
|
||||
pub(crate) mod models;
|
||||
|
||||
/// High-level handle to the orchestrator's local SQLite database.
|
||||
///
|
||||
/// Wraps a [`StorageManager`] and translates between the orchestrator-level
|
||||
/// types (e.g. [`NodeId`], [`Pagination`], [`Duration`]) used by callers and
|
||||
/// the raw SQL-friendly primitives (`i64` ids, `limit`/`offset`, absolute
|
||||
/// timestamps) understood by the manager. All public methods are
|
||||
/// [`Clone`]-safe because [`sqlx::SqlitePool`] is internally reference-counted.
|
||||
#[derive(Clone)]
|
||||
pub(crate) struct NetworkMonitorStorage {
|
||||
pub(crate) storage_manager: StorageManager,
|
||||
}
|
||||
|
||||
impl NetworkMonitorStorage {
|
||||
/// Opens (or creates) the SQLite database at `database_path`, configures
|
||||
/// WAL journaling and incremental auto-vacuum, and runs the embedded
|
||||
/// migrations. Slow statements (>50ms) are logged at `WARN`.
|
||||
pub(crate) async fn init<P: AsRef<Path>>(database_path: P) -> anyhow::Result<Self> {
|
||||
debug!(
|
||||
"attempting to connect to database {}",
|
||||
database_path.as_ref().display()
|
||||
);
|
||||
|
||||
let connect_opts = sqlx::sqlite::SqliteConnectOptions::new()
|
||||
.journal_mode(sqlx::sqlite::SqliteJournalMode::Wal)
|
||||
.synchronous(SqliteSynchronous::Normal)
|
||||
.auto_vacuum(SqliteAutoVacuum::Incremental)
|
||||
.filename(database_path)
|
||||
.create_if_missing(true)
|
||||
.log_statements(LevelFilter::Trace)
|
||||
.log_slow_statements(LevelFilter::Warn, Duration::from_millis(50));
|
||||
|
||||
let connection_pool = sqlx::SqlitePool::connect_with(connect_opts)
|
||||
.await
|
||||
.context("Failed to connect to SQLx database")?;
|
||||
|
||||
sqlx::migrate!("./migrations")
|
||||
.run(&connection_pool)
|
||||
.await
|
||||
.context("Failed to run database migrations")?;
|
||||
|
||||
Ok(Self {
|
||||
storage_manager: StorageManager { connection_pool },
|
||||
})
|
||||
}
|
||||
|
||||
/// Inserts or updates multiple node records in a single transaction.
|
||||
///
|
||||
/// For each node, if a row with the same `node_id` already exists, all fields except
|
||||
/// `identity_key` are updated. The entire batch shares one transaction for efficiency.
|
||||
pub(crate) async fn batch_insert_or_update_nym_nodes(
|
||||
&self,
|
||||
nodes: &[NewNymNode],
|
||||
) -> anyhow::Result<()> {
|
||||
self.storage_manager
|
||||
.batch_insert_or_update_nym_nodes(nodes)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Inserts a completed test run, updates the node's `last_testrun` pointer and
|
||||
/// clears the corresponding `testrun_in_progress` marker. The target node is
|
||||
/// taken from [`NewTestRun::node_id`].
|
||||
///
|
||||
/// Decrements the `TestrunsInProgress` gauge iff a row was actually cleared — a late
|
||||
/// submission whose in-progress row was already reaped by the timeout sweep must not
|
||||
/// double-decrement the gauge.
|
||||
pub(crate) async fn insert_test_run(&self, run: &NewTestRun) -> anyhow::Result<()> {
|
||||
let node_id = run.node_id;
|
||||
let run_id = self.storage_manager.insert_test_run(run).await?;
|
||||
self.storage_manager
|
||||
.set_node_last_testrun(node_id, run_id)
|
||||
.await?;
|
||||
let cleared = self
|
||||
.storage_manager
|
||||
.clear_testrun_in_progress(node_id)
|
||||
.await?;
|
||||
if cleared > 0 {
|
||||
PROMETHEUS_METRICS.inc_by(PrometheusMetric::TestrunsInProgress, -(cleared as i64));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Returns the number of rows currently in `testrun_in_progress`.
|
||||
pub(crate) async fn count_testruns_in_progress(&self) -> anyhow::Result<i64> {
|
||||
self.storage_manager.count_testruns_in_progress().await
|
||||
}
|
||||
|
||||
/// Removes all in-progress markers whose `started_at` is older than `timeout`, on the
|
||||
/// assumption that those runs have timed out and will never complete. Decrements the
|
||||
/// `TestrunsInProgress` gauge by the number of rows actually cleared.
|
||||
pub(crate) async fn clear_timed_out_testruns_in_progress(
|
||||
&self,
|
||||
timeout: Duration,
|
||||
) -> anyhow::Result<u64> {
|
||||
let cutoff = OffsetDateTime::now_utc() - timeout;
|
||||
let cleared = self
|
||||
.storage_manager
|
||||
.clear_timed_out_testruns_in_progress(cutoff)
|
||||
.await?;
|
||||
if cleared > 0 {
|
||||
PROMETHEUS_METRICS.inc_by(PrometheusMetric::TestrunsInProgress, -(cleared as i64));
|
||||
}
|
||||
Ok(cleared)
|
||||
}
|
||||
|
||||
/// Atomically selects the most stale idle mixnode and marks it as having a test run in
|
||||
/// progress.
|
||||
///
|
||||
/// "Most stale" is defined as: nodes that have never been tested come first, followed by
|
||||
/// nodes whose last test run has the oldest timestamp.
|
||||
///
|
||||
/// `staleness_age` acts as a minimum-staleness gate: a node that has already been tested
|
||||
/// is only eligible if its last test run completed more than `staleness_age` ago. Nodes
|
||||
/// that have never been tested are always eligible regardless of this value.
|
||||
///
|
||||
/// The current time is used as the `started_at` timestamp on the resulting
|
||||
/// `testrun_in_progress` row.
|
||||
///
|
||||
/// Nodes with a row in `testrun_in_progress` are excluded entirely. Only nodes classified
|
||||
/// as `mixnode` or `mixnode_and_gateway` are eligible.
|
||||
///
|
||||
/// Returns `None` if no eligible idle mixnode exists.
|
||||
pub(crate) async fn assign_next_mixnode_testrun(
|
||||
&self,
|
||||
staleness_age: Duration,
|
||||
) -> anyhow::Result<Option<NymNode>> {
|
||||
let now = OffsetDateTime::now_utc();
|
||||
let last_tested_before = now - staleness_age;
|
||||
let assigned = self
|
||||
.storage_manager
|
||||
.assign_next_mixnode_testrun(now, last_tested_before)
|
||||
.await?;
|
||||
if assigned.is_some() {
|
||||
PROMETHEUS_METRICS.inc(PrometheusMetric::TestrunsInProgress);
|
||||
}
|
||||
Ok(assigned)
|
||||
}
|
||||
|
||||
/// Fetches a single completed test run by its row id, or `None` if it has
|
||||
/// been evicted or never existed.
|
||||
pub(crate) async fn get_testrun_by_id(&self, id: i64) -> anyhow::Result<Option<TestRun>> {
|
||||
self.storage_manager.get_testrun_by_id(id).await
|
||||
}
|
||||
|
||||
/// Fetches a node by its contract-assigned `node_id`, or `None` if the
|
||||
/// orchestrator has never observed a bond for it.
|
||||
pub(crate) async fn get_nym_node_by_id(
|
||||
&self,
|
||||
node_id: NodeId,
|
||||
) -> anyhow::Result<Option<NymNode>> {
|
||||
self.storage_manager
|
||||
.get_nym_node_by_id(node_id as i64)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Paginated list of outstanding `testrun_in_progress` rows, oldest `started_at`
|
||||
/// first so stale/hung runs surface at the top, with the snapshot-consistent
|
||||
/// total row count.
|
||||
pub(crate) async fn get_testruns_in_progress_paginated(
|
||||
&self,
|
||||
pagination: Pagination,
|
||||
) -> anyhow::Result<(Vec<TestRunInProgress>, usize)> {
|
||||
let (rows, total) = self
|
||||
.storage_manager
|
||||
.get_testruns_in_progress_paginated(pagination.limit(), pagination.offset())
|
||||
.await?;
|
||||
|
||||
Ok((rows, total as usize))
|
||||
}
|
||||
|
||||
/// Paginated list of nodes ordered by `node_id` ascending, with the
|
||||
/// snapshot-consistent total row count. [`Pagination`] is resolved to
|
||||
/// `limit`/`offset` here so the manager never sees the public contract.
|
||||
pub(crate) async fn get_nym_nodes_paginated(
|
||||
&self,
|
||||
pagination: Pagination,
|
||||
) -> anyhow::Result<(Vec<NymNode>, usize)> {
|
||||
let (nodes, total) = self
|
||||
.storage_manager
|
||||
.get_nym_nodes_paginated(pagination.limit(), pagination.offset())
|
||||
.await?;
|
||||
|
||||
Ok((nodes, total as usize))
|
||||
}
|
||||
|
||||
/// Paginated list of completed test runs ordered by `test_timestamp`
|
||||
/// descending (newest first), with the snapshot-consistent total row count.
|
||||
pub(crate) async fn get_testruns_paginated(
|
||||
&self,
|
||||
pagination: Pagination,
|
||||
) -> anyhow::Result<(Vec<TestRun>, usize)> {
|
||||
let (test_results, total) = self
|
||||
.storage_manager
|
||||
.get_testruns_paginated(pagination.limit(), pagination.offset())
|
||||
.await?;
|
||||
|
||||
Ok((test_results, total as usize))
|
||||
}
|
||||
|
||||
/// Paginated list of completed test runs for a single node, ordered newest
|
||||
/// first, with the snapshot-consistent total row count. Backed by the
|
||||
/// `idx_testrun_node_id_timestamp` index. An unknown or never-tested
|
||||
/// `node_id` produces `(vec![], 0)` rather than an error.
|
||||
pub(crate) async fn get_testruns_for_node_paginated(
|
||||
&self,
|
||||
node_id: NodeId,
|
||||
pagination: Pagination,
|
||||
) -> anyhow::Result<(Vec<TestRun>, usize)> {
|
||||
let (test_results, total) = self
|
||||
.storage_manager
|
||||
.get_testruns_for_node_paginated(
|
||||
node_id as i64,
|
||||
pagination.limit(),
|
||||
pagination.offset(),
|
||||
)
|
||||
.await?;
|
||||
|
||||
Ok((test_results, total as usize))
|
||||
}
|
||||
|
||||
/// Returns the id of the newest `testrun` already submitted to the nym-api, or `None` if no
|
||||
/// batch has been submitted yet. Callers treat `None` as "submit everything currently in
|
||||
/// storage".
|
||||
pub(crate) async fn get_last_submitted_testrun_id(&self) -> anyhow::Result<Option<i64>> {
|
||||
self.storage_manager.get_last_submitted_testrun_id().await
|
||||
}
|
||||
|
||||
/// Persists the id of the newest `testrun` whose batch submission to the nym-api has
|
||||
/// succeeded. Subsequent [`Self::get_testruns_after`] calls use this value to avoid
|
||||
/// resubmitting already-acknowledged rows.
|
||||
pub(crate) async fn set_last_submitted_testrun_id(
|
||||
&self,
|
||||
testrun_id: i64,
|
||||
) -> anyhow::Result<()> {
|
||||
self.storage_manager
|
||||
.set_last_submitted_testrun_id(testrun_id)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Fetches every `testrun` row with `id > after_id`, ordered by id ascending.
|
||||
///
|
||||
/// Used by the nym-api submission task to build the next batch of pending results. Ascending
|
||||
/// ordering lets the caller record the highest-id row as the new submission watermark once
|
||||
/// the batch is acknowledged.
|
||||
pub(crate) async fn get_testruns_after(&self, after_id: i64) -> anyhow::Result<Vec<TestRun>> {
|
||||
self.storage_manager.get_testruns_after(after_id).await
|
||||
}
|
||||
|
||||
/// Deletes all `testrun` rows older than `eviction_age` relative to the current time.
|
||||
///
|
||||
/// Intended to be called periodically to keep the local database from growing unboundedly.
|
||||
/// Rows that are evicted are assumed to have already been submitted to the nym-api for
|
||||
/// persistent storage.
|
||||
///
|
||||
/// Any `nym_node.last_testrun` foreign key that pointed at an evicted row is automatically
|
||||
/// set to `NULL` by the database (`ON DELETE SET NULL`).
|
||||
pub(crate) async fn evict_old_testruns(&self, eviction_age: Duration) -> anyhow::Result<u64> {
|
||||
let cutoff = OffsetDateTime::now_utc() - eviction_age;
|
||||
self.storage_manager.evict_old_testruns(cutoff).await
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,406 @@
|
||||
// Copyright 2026 - Nym Technologies SA <contact@nymtech.net>
|
||||
// SPDX-License-Identifier: GPL-3.0-only
|
||||
|
||||
use anyhow::Context;
|
||||
use nym_api_requests::models::network_monitor::StressTestResult;
|
||||
use nym_crypto::asymmetric::{ed25519, x25519};
|
||||
use nym_network_monitor_orchestrator_requests::models::{
|
||||
self as api, LatencyDistribution, NymNodeData, TestRunData, TestRunInProgressData,
|
||||
TestRunResult,
|
||||
};
|
||||
use nym_node_requests::api::v1::node::models::NodeRoles;
|
||||
use nym_validator_client::client::NodeId;
|
||||
use nym_validator_client::nyxd::nym_mixnet_contract_common::NymNodeBond;
|
||||
use std::time::Duration;
|
||||
use time::OffsetDateTime;
|
||||
|
||||
/// Discriminator for the type of node targeted by a [`TestRun`].
|
||||
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq, sqlx::Type)]
|
||||
#[sqlx(type_name = "TEXT", rename_all = "lowercase")]
|
||||
pub(crate) enum TestType {
|
||||
#[default]
|
||||
Mixnode,
|
||||
Gateway,
|
||||
}
|
||||
|
||||
/// Classification of a node based on the roles reported via its self-described endpoint.
|
||||
/// [`NodeType::Unknown`] is used both as the initial value before the node is successfully
|
||||
/// queried and when a queried node reports no roles at all.
|
||||
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq, Hash, sqlx::Type)]
|
||||
#[sqlx(type_name = "TEXT", rename_all = "snake_case")]
|
||||
pub(crate) enum NodeType {
|
||||
#[default]
|
||||
Unknown,
|
||||
Mixnode,
|
||||
Gateway,
|
||||
MixnodeAndGateway,
|
||||
}
|
||||
|
||||
impl NodeType {
|
||||
/// Classifies a node from the `NodeRoles` reported by its self-described endpoint.
|
||||
/// We key off `gateway_enabled` (entry-gateway capability) only — the `exit` property is
|
||||
/// not a useful distinction for test-target selection. A node reporting neither role maps
|
||||
/// to [`NodeType::Unknown`] and will be ignored by the mixnode testrun assignment query.
|
||||
pub(crate) fn from_roles(roles: &NodeRoles) -> Self {
|
||||
match (roles.mixnode_enabled, roles.gateway_enabled) {
|
||||
(true, true) => NodeType::MixnodeAndGateway,
|
||||
(true, false) => NodeType::Mixnode,
|
||||
(false, true) => NodeType::Gateway,
|
||||
(false, false) => NodeType::Unknown,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// The data required to insert a new row into `testrun`. Does not carry an `id` since that
|
||||
/// is assigned by the database on insertion.
|
||||
#[derive(Debug, Clone, sqlx::FromRow)]
|
||||
pub(crate) struct NewTestRun {
|
||||
/// Contract-assigned node id of the node under test.
|
||||
pub(crate) node_id: i64,
|
||||
|
||||
pub(crate) test_type: TestType,
|
||||
pub(crate) test_timestamp: OffsetDateTime,
|
||||
|
||||
/// How long the test took, in microseconds.
|
||||
pub(crate) time_taken_us: i64,
|
||||
|
||||
/// Noise handshake duration on the ingress (responder) side, in microseconds.
|
||||
pub(crate) ingress_noise_handshake_us: Option<i64>,
|
||||
|
||||
/// Noise handshake duration on the egress (initiator) side, in microseconds.
|
||||
pub(crate) egress_noise_handshake_us: Option<i64>,
|
||||
|
||||
/// Constant per-hop sphinx packet delay used during the test run, in microseconds.
|
||||
pub(crate) sphinx_packet_delay_us: i64,
|
||||
|
||||
pub(crate) packets_sent: i64,
|
||||
pub(crate) packets_received: i64,
|
||||
|
||||
/// RTT of the initial probe packet in microseconds. `None` if the probe did not complete.
|
||||
pub(crate) approximate_latency_us: Option<i64>,
|
||||
|
||||
// RTT distribution over received packets (all NULL when no packets were received).
|
||||
pub(crate) packets_rtt_min_us: Option<i64>,
|
||||
pub(crate) packets_rtt_mean_us: Option<i64>,
|
||||
pub(crate) packets_rtt_median_us: Option<i64>,
|
||||
pub(crate) packets_rtt_max_us: Option<i64>,
|
||||
pub(crate) packets_rtt_std_dev_us: Option<i64>,
|
||||
|
||||
// Batch send latency distribution (all NULL when no batches were sent).
|
||||
pub(crate) sending_latency_min_us: Option<i64>,
|
||||
pub(crate) sending_latency_mean_us: Option<i64>,
|
||||
pub(crate) sending_latency_median_us: Option<i64>,
|
||||
pub(crate) sending_latency_max_us: Option<i64>,
|
||||
pub(crate) sending_latency_std_dev_us: Option<i64>,
|
||||
|
||||
pub(crate) received_duplicates: bool,
|
||||
|
||||
/// First error that caused the test to abort. `None` if the run completed without error.
|
||||
pub(crate) error: Option<String>,
|
||||
}
|
||||
|
||||
fn duration_to_us(d: std::time::Duration) -> i64 {
|
||||
d.as_micros() as i64
|
||||
}
|
||||
|
||||
impl NewTestRun {
|
||||
/// Converts an API-level [`TestRunResult`] into a database-ready row,
|
||||
/// flattening [`LatencyDistribution`](nym_network_monitor_orchestrator_requests::models::LatencyDistribution)
|
||||
/// fields into individual microsecond columns and recording the current UTC time as the test timestamp.
|
||||
fn from_result(test_type: TestType, node_id: NodeId, result: TestRunResult) -> Self {
|
||||
NewTestRun {
|
||||
node_id: node_id as i64,
|
||||
test_type,
|
||||
test_timestamp: OffsetDateTime::now_utc(),
|
||||
time_taken_us: result.time_taken.as_micros() as i64,
|
||||
ingress_noise_handshake_us: result.ingress_noise_handshake.map(duration_to_us),
|
||||
egress_noise_handshake_us: result.egress_noise_handshake.map(duration_to_us),
|
||||
sphinx_packet_delay_us: duration_to_us(result.sphinx_packet_delay),
|
||||
packets_sent: result.packets_sent as i64,
|
||||
packets_received: result.packets_received as i64,
|
||||
approximate_latency_us: result.approximate_latency.map(duration_to_us),
|
||||
packets_rtt_min_us: result.packets_statistics.map(|s| duration_to_us(s.minimum)),
|
||||
packets_rtt_mean_us: result.packets_statistics.map(|s| duration_to_us(s.mean)),
|
||||
packets_rtt_median_us: result.packets_statistics.map(|s| duration_to_us(s.median)),
|
||||
packets_rtt_max_us: result.packets_statistics.map(|s| duration_to_us(s.maximum)),
|
||||
packets_rtt_std_dev_us: result
|
||||
.packets_statistics
|
||||
.map(|s| duration_to_us(s.standard_deviation)),
|
||||
sending_latency_min_us: result.sending_statistics.map(|s| duration_to_us(s.minimum)),
|
||||
sending_latency_mean_us: result.sending_statistics.map(|s| duration_to_us(s.mean)),
|
||||
sending_latency_median_us: result.sending_statistics.map(|s| duration_to_us(s.median)),
|
||||
sending_latency_max_us: result.sending_statistics.map(|s| duration_to_us(s.maximum)),
|
||||
sending_latency_std_dev_us: result
|
||||
.sending_statistics
|
||||
.map(|s| duration_to_us(s.standard_deviation)),
|
||||
received_duplicates: result.received_duplicates,
|
||||
error: result.error,
|
||||
}
|
||||
}
|
||||
|
||||
/// Creates a new test run row for a mixnode stress test result.
|
||||
pub(crate) fn from_mixnode_result(node_id: NodeId, result: TestRunResult) -> Self {
|
||||
Self::from_result(TestType::Mixnode, node_id, result)
|
||||
}
|
||||
|
||||
/// Creates a new test run row for a gateway stress test result.
|
||||
#[allow(dead_code)]
|
||||
pub(crate) fn from_gateway_result(node_id: NodeId, result: TestRunResult) -> Self {
|
||||
Self::from_result(TestType::Gateway, node_id, result)
|
||||
}
|
||||
}
|
||||
|
||||
/// A row from the `testrun` table, as returned by a SELECT.
|
||||
#[derive(Debug, Clone, sqlx::FromRow)]
|
||||
pub(crate) struct TestRun {
|
||||
pub(crate) id: i64,
|
||||
|
||||
#[sqlx(flatten)]
|
||||
pub(crate) inner: NewTestRun,
|
||||
}
|
||||
|
||||
fn us_to_duration(us: i64) -> Duration {
|
||||
Duration::from_micros(us as u64)
|
||||
}
|
||||
|
||||
/// Reassembles a [`LatencyDistribution`] from its four flattened microsecond columns.
|
||||
/// Returns `None` if any column is `NULL`; the four columns are always all-set or all-NULL
|
||||
/// together (see [`NewTestRun::from_result`]).
|
||||
fn latency_distribution(
|
||||
min_us: Option<i64>,
|
||||
mean_us: Option<i64>,
|
||||
median_us: Option<i64>,
|
||||
max_us: Option<i64>,
|
||||
std_dev_us: Option<i64>,
|
||||
) -> Option<LatencyDistribution> {
|
||||
match (min_us, mean_us, median_us, max_us, std_dev_us) {
|
||||
(Some(min), Some(mean), Some(median), Some(max), Some(std_dev)) => {
|
||||
Some(LatencyDistribution {
|
||||
minimum: us_to_duration(min),
|
||||
mean: us_to_duration(mean),
|
||||
median: us_to_duration(median),
|
||||
maximum: us_to_duration(max),
|
||||
standard_deviation: us_to_duration(std_dev),
|
||||
})
|
||||
}
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Maps the internal enum onto its public API counterpart. Kept as a separate
|
||||
/// type so `sqlx::Type` can be derived on the internal side without leaking
|
||||
/// sqlx into the public request crate.
|
||||
impl From<TestType> for api::TestType {
|
||||
fn from(t: TestType) -> Self {
|
||||
match t {
|
||||
TestType::Mixnode => api::TestType::Mixnode,
|
||||
TestType::Gateway => api::TestType::Gateway,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Lifts a `testrun` row into the public [`TestRunData`] shape: widens `i64`
|
||||
/// ids/counters to the API's `u32`/`usize`, reconstitutes each
|
||||
/// `LatencyDistribution` from its four microsecond columns, and converts
|
||||
/// microsecond integers back into `std::time::Duration`.
|
||||
impl From<TestRun> for TestRunData {
|
||||
fn from(run: TestRun) -> Self {
|
||||
let inner = run.inner;
|
||||
TestRunData {
|
||||
id: run.id,
|
||||
node_id: inner.node_id as u32,
|
||||
test_type: inner.test_type.into(),
|
||||
test_timestamp: inner.test_timestamp,
|
||||
result: TestRunResult {
|
||||
time_taken: Duration::from_micros(inner.time_taken_us as u64),
|
||||
ingress_noise_handshake: inner.ingress_noise_handshake_us.map(us_to_duration),
|
||||
egress_noise_handshake: inner.egress_noise_handshake_us.map(us_to_duration),
|
||||
sphinx_packet_delay: us_to_duration(inner.sphinx_packet_delay_us),
|
||||
packets_sent: inner.packets_sent as usize,
|
||||
packets_received: inner.packets_received as usize,
|
||||
approximate_latency: inner.approximate_latency_us.map(us_to_duration),
|
||||
packets_statistics: latency_distribution(
|
||||
inner.packets_rtt_min_us,
|
||||
inner.packets_rtt_mean_us,
|
||||
inner.packets_rtt_median_us,
|
||||
inner.packets_rtt_max_us,
|
||||
inner.packets_rtt_std_dev_us,
|
||||
),
|
||||
sending_statistics: latency_distribution(
|
||||
inner.sending_latency_min_us,
|
||||
inner.sending_latency_mean_us,
|
||||
inner.sending_latency_median_us,
|
||||
inner.sending_latency_max_us,
|
||||
inner.sending_latency_std_dev_us,
|
||||
),
|
||||
received_duplicates: inner.received_duplicates,
|
||||
error: inner.error,
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Projects a completed `testrun` row onto the nym-api's `StressTestResult` shape used by the
|
||||
/// stress-test batch submission endpoint.
|
||||
///
|
||||
/// Two fields are synthesised here rather than stored directly:
|
||||
///
|
||||
/// - `test_performance` is `packets_received / packets_sent` clamped to `[0.0, 1.0]`. A run that
|
||||
/// sent no packets collapses to `0.0`; `was_reachable` is the signal that lets the server tell
|
||||
/// that case apart from a genuine zero score.
|
||||
/// - `was_reachable` is `error.is_none()` — i.e. the test completed without an abort error. A run
|
||||
/// that aborted before the node responded sets `error` to the first failure, so the inverse is
|
||||
/// an accurate "did we reach the node at all" signal.
|
||||
impl From<TestRun> for StressTestResult {
|
||||
fn from(run: TestRun) -> Self {
|
||||
let id = run.id;
|
||||
let inner = run.inner;
|
||||
|
||||
// if we have received any duplicate packets, we have to discard the entire result,
|
||||
// as an honest node would never replay a packet
|
||||
let test_performance = if inner.packets_sent > 0 && !inner.received_duplicates {
|
||||
inner.packets_received as f64 / inner.packets_sent as f64
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
StressTestResult {
|
||||
testrun_id: id,
|
||||
node_id: inner.node_id as u32,
|
||||
is_mixnode: matches!(inner.test_type, TestType::Mixnode),
|
||||
test_timestamp: inner.test_timestamp,
|
||||
test_performance,
|
||||
was_reachable: inner.error.is_none(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// The data required to insert or update a row in `nym_node`. Does not carry `last_testrun`
|
||||
/// since that is managed separately via [`StorageManager::set_node_last_testrun`].
|
||||
#[derive(Debug, Clone, sqlx::FromRow)]
|
||||
pub(crate) struct NewNymNode {
|
||||
/// Node ID as assigned by the mixnet contract.
|
||||
pub(crate) node_id: i64,
|
||||
|
||||
/// Ed25519 identity key, base58-encoded.
|
||||
/// A node_id always maps to exactly one identity_key and is never reassigned.
|
||||
pub(crate) identity_key: String,
|
||||
|
||||
/// When this node was last observed as bonded in the contract.
|
||||
pub(crate) last_seen_bonded: OffsetDateTime,
|
||||
|
||||
/// Mixnet socket address (host:port) at which the node accepts sphinx packets.
|
||||
/// Stored as a string; parse with `str::parse::<SocketAddr>()` when needed.
|
||||
pub(crate) mixnet_socket_address: Option<String>,
|
||||
|
||||
/// X25519 public key used for Noise handshakes, base58-encoded.
|
||||
/// `None` if retrieval from the node failed.
|
||||
pub(crate) noise_key: Option<String>,
|
||||
|
||||
/// Sphinx public key used for packet encryption, base58-encoded.
|
||||
/// `None` if retrieval from the node failed.
|
||||
/// Always `None`/`Some` together with `key_rotation_id`.
|
||||
pub(crate) sphinx_key: Option<String>,
|
||||
|
||||
/// Key rotation epoch ID that `sphinx_key` belongs to.
|
||||
/// `None` if retrieval from the node failed.
|
||||
/// Always `None`/`Some` together with `sphinx_key`.
|
||||
pub(crate) key_rotation_id: Option<i64>,
|
||||
|
||||
/// Classification of the node based on the roles reported via its self-described endpoint.
|
||||
/// [`NodeType::Unknown`] if the self-described retrieval failed.
|
||||
pub(crate) node_type: NodeType,
|
||||
}
|
||||
|
||||
impl NewNymNode {
|
||||
pub(crate) fn from_bond(bond: &NymNodeBond) -> Self {
|
||||
NewNymNode {
|
||||
node_id: bond.node_id as i64,
|
||||
identity_key: bond.identity().to_string(),
|
||||
last_seen_bonded: OffsetDateTime::now_utc(),
|
||||
mixnet_socket_address: None,
|
||||
noise_key: None,
|
||||
sphinx_key: None,
|
||||
key_rotation_id: None,
|
||||
node_type: NodeType::Unknown,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A row from the `testrun_in_progress` table.
|
||||
#[derive(Debug, Clone, sqlx::FromRow)]
|
||||
pub(crate) struct TestRunInProgress {
|
||||
pub(crate) node_id: i64,
|
||||
pub(crate) started_at: OffsetDateTime,
|
||||
}
|
||||
|
||||
/// Lifts a `testrun_in_progress` row into the public shape, narrowing `node_id`
|
||||
/// from the sqlx-native `i64` to the API's `u32`.
|
||||
impl From<TestRunInProgress> for TestRunInProgressData {
|
||||
fn from(row: TestRunInProgress) -> Self {
|
||||
TestRunInProgressData {
|
||||
node_id: row.node_id as u32,
|
||||
started_at: row.started_at,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A row from the `nym_node` table, as returned by a SELECT.
|
||||
#[derive(Debug, Clone, sqlx::FromRow)]
|
||||
pub(crate) struct NymNode {
|
||||
#[sqlx(flatten)]
|
||||
pub(crate) inner: NewNymNode,
|
||||
|
||||
/// ID of the most recent test run against this node. `None` if never tested.
|
||||
pub(crate) last_testrun: Option<i64>,
|
||||
}
|
||||
|
||||
/// Decodes a node's stored base58 key strings and parses the socket address
|
||||
/// into typed counterparts for the public API. Fails (with context) when any
|
||||
/// stored value is malformed — this should not happen in practice because the
|
||||
/// orchestrator writes these fields itself, so a failure here indicates
|
||||
/// corruption or a schema regression and is surfaced as
|
||||
/// [`crate::http::api::v1::error::ApiError::MalformedStoredData`] by callers.
|
||||
impl TryFrom<NewNymNode> for NymNodeData {
|
||||
type Error = anyhow::Error;
|
||||
|
||||
fn try_from(node: NewNymNode) -> anyhow::Result<Self> {
|
||||
let identity_key = ed25519::PublicKey::from_base58_string(&node.identity_key)
|
||||
.context("invalid identity_key")?;
|
||||
|
||||
let mixnet_socket_address = node
|
||||
.mixnet_socket_address
|
||||
.map(|s| s.parse().context("invalid mixnet_socket_address"))
|
||||
.transpose()?;
|
||||
|
||||
let noise_key = node
|
||||
.noise_key
|
||||
.map(|s| x25519::PublicKey::from_base58_string(&s).context("invalid noise_key"))
|
||||
.transpose()?;
|
||||
|
||||
let sphinx_key = node
|
||||
.sphinx_key
|
||||
.map(|s| x25519::PublicKey::from_base58_string(&s).context("invalid sphinx_key"))
|
||||
.transpose()?;
|
||||
|
||||
Ok(NymNodeData {
|
||||
node_id: node.node_id as u32,
|
||||
identity_key,
|
||||
last_seen_bonded: node.last_seen_bonded,
|
||||
mixnet_socket_address,
|
||||
noise_key,
|
||||
sphinx_key,
|
||||
key_rotation_id: node.key_rotation_id,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
/// Convenience pass-through that drops `last_testrun` (callers that need the
|
||||
/// latest run fetch it explicitly via [`TestRun`]) and delegates to the
|
||||
/// [`NewNymNode`] conversion for the rest of the fields.
|
||||
impl TryFrom<NymNode> for NymNodeData {
|
||||
type Error = anyhow::Error;
|
||||
|
||||
fn try_from(node: NymNode) -> anyhow::Result<Self> {
|
||||
node.inner.try_into()
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user