1
0
forked from GRIN/grim

nostr: Connected flag tracks the fast relay-live probe, not the 30s catch-up fetch

Symptom: after 'Save & reconnect' of the relay list, the home/onboarding UI sat on 'Connecting relays…' for ~30s even though the relays had physically reconnected over the exit in ~2-4s.

Cause: on restart the service clears connected=false, then the UI flag was only restored AFTER publish_identity (serial, untimed per-event sends) AND a catch-up fetch_events_from bounded by FETCH_TIMEOUT=30s. One relay slow to EOSE pinned is_connected() false for the whole window while the connection was already usable. A separate FAST probe task already detects first-relay-Connected at 250ms poll (~2-4s) and reports relay-live to nymproc, but it did not touch the UI flag.

Fix: in that fast probe, when relays first report Connected (same point that calls report_relay_live), also set svc.connected=true. The indicator now tracks the real ~2-4s relay-up signal; publish_identity + the catch-up fetch continue in the background. Tradeoff (documented in code): a relay drop between the probe store(true) and the 2s status loop taking over wouldn't flip the flag for up to ~30s until the post-catch-up re-check re-syncs to reality — the same-order staleness as the old pessimistic gap, just optimistic; the transport watchdog still tracks real exit health independently.

Hardening: publish_identity's per-event send_event_to was untimed, so a stalled relay delayed the catch-up fetch and the kind:1059 subscription that follow it (real incoming-message latency). Each publish is now wrapped in tokio::time::timeout(SEND_TIMEOUT), mirroring dispatch_dm; on timeout it warns and continues to the next event, never aborting the sequence.

Audit: all readers of is_connected() were reviewed for the relaxed invariant (flag can now be true before the giftwrap subscription is established). gui/goblin/mod.rs and gui/goblin/onboarding.rs use it for display + repaint scheduling and to enable the claim-username button — claiming needs relays connected (which the flag now genuinely means), not the incoming kind:1059 subscription. wallet/e2e.rs uses it as a test precondition with downstream waits of 900s/2400s and relays replay stored gift wraps on subscribe, so it still converges. No reader treats is_connected() as 'safe to receive now', so no separate ui_connected flag is needed.
This commit is contained in:
2ro
2026-07-03 15:39:02 -04:00
parent e6e262009e
commit ce23214d98
+25 -2
View File
@@ -938,6 +938,22 @@ async fn run_service(svc: Arc<NostrService>, wallet: Wallet) {
"nostr: first relay Connected ~{}ms after connect()",
connect_started.elapsed().as_millis()
);
// Flip the UI "Connected" flag on the REAL relay-up signal
// (~2-4s over the exit) instead of gating it behind
// publish_identity + the up-to-30s catch-up fetch below: those are
// receive-side housekeeping and keep running in the background,
// while the relay is already usable the moment it reaches
// Connected. Without this, one relay slow to EOSE pinned the
// indicator on "Connecting relays…" for ~30s even though the
// connection was live in ~2-4s.
//
// Accepted tradeoff: between here and the 2s status loop taking
// over, a relay DROP wouldn't flip the flag back for up to ~30s
// (until the post-catch-up re-check re-syncs it to reality) — the
// same-order staleness as the old pessimistic gap, just optimistic
// instead. The transport watchdog (nymproc) still tracks real exit
// health independently of this UI flag.
svc_probe.connected.store(true, Ordering::Relaxed);
// FAST relay-live report: closes nymproc's relay-readiness
// window as soon as the exit is proven to carry relay traffic,
// independent of the up-to-30s catch-up fetch below (a slow
@@ -1281,8 +1297,15 @@ async fn publish_identity(svc: &Arc<NostrService>, client: &Client) {
}
}
for event in &events {
if let Err(e) = client.send_event_to(&advertised, event).await {
warn!("nostr: publish kind {} failed: {e}", event.kind);
// Time-box each publish (mirrors dispatch_dm's SEND_TIMEOUT): this loop is
// awaited before the catch-up fetch and the kind:1059 subscription below, so
// an untimed send to a stalled relay would delay real incoming-message
// delivery. On timeout, warn and move on to the next event — never abort the
// identity sequence.
match tokio::time::timeout(SEND_TIMEOUT, client.send_event_to(&advertised, event)).await {
Ok(Ok(_)) => {}
Ok(Err(e)) => warn!("nostr: publish kind {} failed: {e}", event.kind),
Err(_) => warn!("nostr: publish kind {} timed out", event.kind),
}
}