From 888e526fa7e6a7a2717d11db17399484fbac725f Mon Sep 17 00:00:00 2001 From: jay Date: Tue, 30 Dec 2025 09:53:47 -0500 Subject: [PATCH] Add [Claude Research] Websocket Research for Nostr --- ...esearch%5D-Websocket-Research-for-Nostr.md | 336 ++++++++++++++++++ 1 file changed, 336 insertions(+) create mode 100644 %5BClaude-Research%5D-Websocket-Research-for-Nostr.md diff --git a/%5BClaude-Research%5D-Websocket-Research-for-Nostr.md b/%5BClaude-Research%5D-Websocket-Research-for-Nostr.md new file mode 100644 index 0000000..5933bad --- /dev/null +++ b/%5BClaude-Research%5D-Websocket-Research-for-Nostr.md @@ -0,0 +1,336 @@ +# WebSocket Transport Stack Best Practices for Go + +**The critical insight from production systems handling millions of concurrent connections: success depends on ruthless separation of concerns across layers, explicit state management, and strategic resource pooling.** Memory—not CPU—is the primary scaling constraint. A naive implementation consumes 24-32KB per connection; optimized approaches reduce this to 4KB, enabling 10x more connections on the same hardware. This research examines battle-tested patterns from Gorilla WebSocket, nhooyr/websocket, gobwas/ws, and production systems at scale (Slack, WhatsApp, Mail.Ru) to inform the design of your three-layer Nostr WebSocket stack. + +## Design philosophy for layered architecture + +Your three-layer approach aligns perfectly with production WebSocket architectures. **The key is maintaining clean boundaries while sharing resources efficiently.** Layer 1 (roots-ws) should provide zero-allocation primitives for frame handling and message encoding. Layer 2 (honeybee) builds practical connection management with automatic reconnection, heartbeat monitoring, and state machines. Layer 3 (mana-ws) adds sophisticated features like connection pools, compression with PreparedMessage caching, and adaptive rate limiting—all without duplicating lower-layer logic. + +The most successful Go WebSocket libraries demonstrate this separation. Gorilla WebSocket separates HTTP upgrade from frame handling from application logic. gobwas/ws goes further with a three-tier API (raw frames → wsutil helpers → application). nhooyr/websocket achieves simplicity by consolidating common patterns while maintaining composability. Your architecture should borrow from all three: primitives from gobwas, practical patterns from Gorilla, and API design from nhooyr. + +## Connection management and reconnection strategies + +**Production reconnection always uses exponential backoff with jitter to prevent thundering herd problems.** The consensus across Socket.io, Discord, Slack, and websocket libraries converges on specific values: initial delay of 1-2 seconds, maximum delay of 30-60 seconds, backoff factor of 2x, and jitter of ±20% (0-3 seconds random). Socket.io's default configuration is particularly battle-tested: 1000ms initial, 5000ms max, 0.5 randomization factor, with the formula `delay = initialDelay * 2^attempt * (1 + randomFactor * (random - 0.5))`. + +**The complete reconnection manager pattern** integrates exponential backoff, ping/pong heartbeats, and automatic recovery. In your Layer 2 (honeybee), implement this as a ConnectionManager that owns the connection lifecycle. The critical insight: track attempt count, calculate backoff with jitter, handle context cancellation gracefully, and reset attempt counter only after successful connection—not after connection established but before first successful message exchange. + +```go +type ConnectionManager struct { + config ReconnectConfig + conn *websocket.Conn + attempt int + state *atomic.Value // ConnectionState + stopChan chan struct{} + reconnectCh chan struct{} +} + +func (cm *ConnectionManager) calculateBackoff() time.Duration { + if cm.attempt == 0 { + return 0 + } + base := cm.config.InitialDelay.Seconds() + delay := base * math.Pow(2, float64(cm.attempt-1)) + jitter := rand.Float64() * 1.0 // 0-1 second jitter + delay += jitter + return time.Duration(min(delay, cm.config.MaxDelay.Seconds()) * float64(time.Second)) +} +``` + +**Heartbeat timing is critical for production reliability.** Analysis of production systems reveals optimal parameters: ping interval of 20-30 seconds, pong timeout of 20-30 seconds, total dead connection detection within 40-60 seconds. The websockets Python library uses 20s/20s (40s total). Socket.io uses 25s/60s. Phoenix Framework defaults to 30s. Jenkins/Jetty require pings under 30s to prevent idle timeout. The key: infrastructure timeouts (HAProxy 30-60s, Azure 30min with proper config, nginx 60s) demand active heartbeats, not passive TCP keepalive which defaults to 2+ hours. + +**Half-open connection detection requires application-level heartbeat, not TCP keepalive.** When a connection silently breaks (no TCP FIN/RST), it appears connected but is dead. TCP keepalive is insufficient—it only detects TCP-level issues and has multi-hour defaults. Implement a missed-heartbeat counter: track consecutive missed pongs, close connection after 3 missed heartbeats, implement both ping/pong and heartbeat messages depending on whether you control both ends. For Nostr where you don't control all clients, use WebSocket protocol ping/pong frames and expose SetPongHandler for connection health monitoring. + +## Error handling and state management + +**Error handling must distinguish network errors, protocol errors, and application errors with explicit retriability classification.** The key insight from production Go libraries: Go 1.18 deprecated `net.Error.Temporary()`, requiring explicit classification. Network timeouts are retriable (exponential backoff), connection refused suggests service down (longer backoff), connection reset requires immediate reconnection, EOF indicates clean close. Protocol errors vary: CloseProtocolError is fatal, CloseMessageTooBig suggests buffer adjustment, CloseServiceRestart demands retry with backoff. + +```go +type WebSocketError struct { + category ErrorCategory // Network, Protocol, Application, Internal + code int + message string + cause error + retriable bool +} + +func IsRetriable(err error) bool { + var netErr net.Error + if errors.As(err, &netErr) && netErr.Timeout() { + return true + } + var closeErr *websocket.CloseError + if errors.As(err, &closeErr) { + switch closeErr.Code { + case websocket.CloseGoingAway, + websocket.CloseAbnormalClosure, + websocket.CloseServiceRestart, + websocket.CloseTryAgainLater: + return true + } + } + return false +} +``` + +**State machines prevent invalid transitions and race conditions.** Define explicit states: Disconnected, Connecting, Connected, Closing, Closed, Error. Valid transitions form a directed graph—Disconnected → Connecting → Connected → Closing → Closed, with Error as an escape hatch from any state. Use atomic state storage with compare-and-swap for thread-safety. Layer 2 should expose state via read-only accessor; Layer 3 subscribes to state changes for pool management. + +**Error propagation uses cockroachdb/errors pattern for production-grade context.** Wrap errors with `errors.Wrap(err, "context")` to preserve stack traces and add context without losing the original error type. Add safe details with `errors.WithSafeDetails(err, "key", value)` for PII-free logging. Support `errors.Is()` and `errors.As()` for error checking. Define sentinel errors (`ErrConnectionClosed`, `ErrTimeout`) as package-level variables for reliable comparison. + +**Circuit breakers provide graceful degradation under load.** For Layer 3 (mana-ws), implement adaptive circuit breakers tracking failure rates with time-based state transitions. When failure rate exceeds threshold (typically 60% over 10+ requests), transition to Open state, rejecting requests immediately. After timeout (30-60s), enter Half-Open state allowing limited probes. Successful probes close the circuit; failures reopen it. This prevents cascading failures when downstream services are unhealthy. + +## Performance optimization techniques + +**Message batching dramatically reduces overhead by combining small messages into single frames.** Each WebSocket frame incurs 2-14 bytes overhead plus syscall cost. For high-frequency small messages, batch 5-10 messages together. Implementation: accumulate messages in a buffer for 10-50ms or until batch size reached, then send as single frame. Critical for Nostr relay servers handling hundreds of events per second. Balance batching delay against latency requirements—real-time applications tolerate 10ms batching, analytical workloads accept 100ms. + +**Compression requires sophisticated memory management to avoid worse performance.** Centrifuge's real-world case study demonstrates the pitfall: naive compression for broadcasts caused 2.5x memory spikes and sync.Pool contention as concurrent broadcasts created duplicate compressors. **The solution: PreparedMessage caching.** Compress messages once, cache with 1-second TTL, reuse across all connections. This reduced Centrifuge's bandwidth from 45 to 15 MiB/s (3x reduction, $12K/month savings), eliminated memory spikes, and restored CPU to pre-compression levels. + +```go +type MessageCache struct { + cache sync.Map + ttl time.Duration +} + +func (mc *MessageCache) GetOrCompress(msg []byte) (*websocket.PreparedMessage, error) { + key := hash(msg) + if cached, ok := mc.cache.Load(key); ok { + return cached.(*websocket.PreparedMessage), nil + } + + prepared, err := websocket.NewPreparedMessage(websocket.TextMessage, msg) + if err != nil { + return nil, err + } + + mc.cache.Store(key, prepared) + time.AfterFunc(mc.ttl, func() { mc.cache.Delete(key) }) + return prepared, nil +} +``` + +**Per-message deflate configuration trades memory for compression ratio.** Default settings (windowBits=15, memLevel=8) consume ~300KB per connection—unacceptable at scale. Reducing window size to 10-12 bits (1-4KB buffers) uses less memory with minimal compression loss: 2KB windows achieve ~73% compression for JSON, 32KB windows yield ~77%—diminishing returns. Disable context takeover for further memory savings by resetting compression context between messages, though this slightly reduces compression ratio. + +**Zero-copy patterns eliminate allocations in hot paths.** gobwas/ws demonstrates extreme optimization: zero-copy HTTP upgrade (vs 8KB allocations in net/http), on-demand goroutine spawning (vs 2 goroutines × 8-16KB stack per connection), netpoll-based I/O (epoll/kqueue, eliminating read goroutine until data ready). Combined savings: 48-72 GB at 3M connections. For roots-ws (Layer 1), provide zero-copy frame reading/writing APIs; honeybee (Layer 2) uses these primitives with managed goroutines; mana-ws (Layer 3) adds netpoll for extreme scale. + +**Buffer pooling with sync.Pool reduces allocations by 45-100x.** Critical for I/O buffers, JSON encoding/decoding, and especially compression (flate.Writer is 650KB!). Pattern: acquire from pool, use, reset, return to pool. **Critical gotcha: always reset before returning and limit oversized buffers.** Buffer growth during large message handling creates 1MB+ buffers—returning these to pool causes memory bloat. Implement max buffer size check (64KB typical) before returning to pool. + +```go +var bufferPool = sync.Pool{ + New: func() interface{} { return new(bytes.Buffer) }, +} + +func process() { + buf := bufferPool.Get().(*bytes.Buffer) + defer func() { + if buf.Cap() > 64*1024 { + return // Don't return oversized buffers + } + buf.Reset() + bufferPool.Put(buf) + }() + // Use buffer +} +``` + +**Goroutine pooling prevents resource exhaustion under load.** Creating goroutines per message causes self-DDoS during traffic spikes. Instead, use fixed-size worker pools (typically 2x CPU cores for CPU-bound work, higher for I/O-bound). Pattern: `pool.Schedule(func() { processMessage(msg) })` queues work for available workers, providing backpressure when pool saturated. Layer 2 uses goroutines per connection (acceptable); Layer 3 uses worker pools for message processing. + +## Connection pooling and load balancing + +**WebSocket connection pooling differs fundamentally from HTTP pooling because connections are long-lived and stateful.** Unlike HTTP where pools manage reusable connections to servers, WebSocket pools manage active connections for load distribution (client-side) or connection limits/health (server-side). The key insight: WebSocket pools are about managing long-lived resources, not connection reuse. + +**Server-side connection management uses the Hub pattern for broadcast coordination.** The canonical pattern from Gorilla examples: Hub maintains `map[*Client]bool`, channels for register/unregister/broadcast. Hub runs in single goroutine, serializing access to client map. Each client has dedicated read/write goroutines plus buffered send channel. Broadcast sends to all client send channels without blocking. Critical implementation detail: use `select` with `default` when sending to client channels—if send would block, the client is slow/stuck, so disconnect it. + +```go +type Hub struct { + clients map[*Client]bool + broadcast chan []byte + register chan *Client + unregister chan *Client +} + +func (h *Hub) run() { + for { + select { + case client := <-h.register: + h.clients[client] = true + case message := <-h.broadcast: + for client := range h.clients { + select { + case client.send <- message: + default: + close(client.send) + delete(h.clients, client) + } + } + } + } +} +``` + +**Database connection pooling is critical for WebSocket applications because each long-lived WebSocket may need database access.** Use database/sql built-in pooling with careful tuning: `SetMaxOpenConns` limits total connections (rule of thumb: CPUs × 2 + spindles), `SetMaxIdleConns` keeps connections warm (10-25% of max), `SetConnMaxLifetime` prevents stale connections (1-2 hours), `SetConnMaxIdleTime` closes idle connections (5 minutes). Too few connections causes deadlock; too many overloads database. Monitor connection pool metrics in production. + +**Load balancing WebSocket connections requires sticky sessions because connections are stateful.** Layer 4 (TCP) load balancing uses source IP hashing for deterministic routing—fast but inflexible. Layer 7 (HTTP/application) load balancing inspects WebSocket handshake for intelligent routing but must maintain connection state. HAProxy configuration: `balance source` (sticky sessions), `timeout tunnel 3600s` (keep WebSocket alive). For scaling beyond 65K connections per load balancer, bind multiple source IPs to overcome port exhaustion. + +**Dynamic load distribution for new servers uses out-of-band routing.** Erlang-based systems track connection counts per node, return least-loaded node IP during authentication, client connects directly. Benefits: no load balancer reconfiguration, automatic load distribution as nodes added/removed, scales horizontally without central bottleneck. For Nostr, implement relay hints that guide clients to less-loaded relays. + +## State management in WebSocket applications + +**Reader/writer goroutine separation is the canonical concurrency pattern.** Gorilla WebSocket requires this because concurrent writes aren't synchronized (deliberate design choice for zero-overhead). Pattern: one goroutine calls `ReadMessage()` in loop, one goroutine reads from send channel and calls `WriteMessage()`. Communicate via channels. Critical: never call write methods from multiple goroutines simultaneously without external synchronization. + +nhooyr/websocket takes opposite approach: internal write synchronization, allowing concurrent writes from any goroutine. Trade-off: 2KB overhead per connection for synchronization goroutine (temporary until Go adds `context.AfterFunc` to stdlib). For your architecture, Layer 2 should handle this complexity—expose simple Read/Write API that's safe for concurrent use, implementing synchronization internally. + +**Channel buffer sizing balances memory usage and backpressure.** Unbuffered channels (0) cause tight coupling and potential deadlock. Small buffers (1-10) work for low-traffic connections. Large buffers (256-1024) handle bursty traffic at cost of memory. **Production guideline: 256-512 messages for small (<1KB) messages, 128-256 for medium (1-10KB), 32-64 for large (>10KB).** Monitor channel depth metrics to tune sizing. When channel consistently near full, either client is slow (apply backpressure) or buffer too small (increase size). + +**Context-based cancellation provides clean shutdown.** nhooyr's pattern: accept `context.Context` on all operations, use `context.WithTimeout` for operation timeouts, `context.WithCancel` for lifecycle management. When connection closes, cancel context—all pending operations abort gracefully. Layer 2 API should accept contexts; Layer 1 primitives remain context-free for flexibility. + +**Connection state tracking uses atomic operations for lock-free reads.** Store state in `atomic.Value`, use compare-and-swap for state transitions, expose read-only `GetState()` accessor. Avoid mutex on read path—state checks are frequent, locking adds contention. Reserve mutex for complex transitions requiring multiple operations. + +## Testing strategies for WebSocket libraries + +**httptest.NewServer is the foundation for integration testing.** Pattern: create test server with HTTP handler that upgrades to WebSocket, convert URL from `http://` to `ws://`, connect with `websocket.DefaultDialer.Dial()`, test message exchange, verify behavior. This tests real network stack without external dependencies. Critical: use `defer server.Close()` and `defer ws.Close()` to prevent resource leaks in tests. + +```go +func TestEchoHandler(t *testing.T) { + s := httptest.NewServer(http.HandlerFunc(echoHandler)) + defer s.Close() + + u := "ws" + strings.TrimPrefix(s.URL, "http") + ws, _, err := websocket.DefaultDialer.Dial(u, nil) + require.NoError(t, err) + defer ws.Close() + + err = ws.WriteMessage(websocket.TextMessage, []byte("hello")) + require.NoError(t, err) + + _, msg, err := ws.ReadMessage() + require.NoError(t, err) + assert.Equal(t, "hello", string(msg)) +} +``` + +**In-memory testing with wstest eliminates network overhead for faster unit tests.** The `github.com/posener/wstest` library provides in-memory dialer connecting directly to handler without TCP stack. Use for testing handler logic in isolation. Downside: doesn't test actual network behavior, timeouts, or connection issues—complement with httptest-based integration tests. + +**Mock interfaces enable isolated testing of business logic.** Define interfaces for WebSocketConn (Close, WriteJSON, ReadJSON, etc.), use real implementation in production, mock implementation in tests. Pattern: inject interface dependency, test supplies mock with canned responses. For testing reconnection logic, mock can simulate failures. For your architecture, Layer 2 should define interfaces; Layer 3 and application code depend on interfaces, not concrete types. + +**Testing reconnection logic requires time control and failure injection.** Use table-driven tests with expected backoff sequences. Verify: attempts count correctly, backoff increases exponentially, jitter adds randomness without exceeding max, max delay honored, context cancellation aborts reconnection. Mock the dialer to inject failures on first N attempts, then succeed. Verify state transitions: Disconnected → Connecting → Error → Disconnected → Connecting → Connected. + +**Load testing tools: k6 for WebSocket protocol testing, Go benchmarks for library internals.** k6 provides WebSocket support with JavaScript test scripts: define virtual user behavior, ramp up load stages, check response status, measure latency percentiles. Run with `k6 run --vus 1000 --duration 2m script.js`. Go benchmarks use `testing.B` with `RunParallel` for concurrent load: `b.RunParallel(func(pb *testing.PB) { ... })` spawns goroutines, each running benchmark until `pb.Next()` returns false. Profile with pprof: `go test -bench=. -cpuprofile=cpu.prof -memprofile=mem.prof`. + +**Test helpers reduce boilerplate and improve readability.** Define `mustDialWS(t, url)` that connects or fails test, `writeWSMessage(t, conn, msg)` that writes or fails, `within(t, duration, assertFn)` that runs assertion with timeout. Mark helpers with `t.Helper()` so failures report correct line numbers. Pattern: one helper per common operation, helpers always take `testing.TB` as first parameter for compatibility with tests and benchmarks. + +**Autobahn test suite compliance is mandatory for protocol correctness.** Autobahn provides 500+ test cases covering all WebSocket protocol edge cases. All production libraries (Gorilla, nhooyr, gobwas) pass full suite. Run Autobahn in Docker against your server, generate HTML report with results. Critical for Layer 1 (roots-ws) which implements protocol primitives. + +## Rate limiting and back pressure implementation + +**Token bucket is the standard rate limiting algorithm for WebSocket applications because it allows bursts while maintaining average rate.** The `golang.org/x/time/rate` package provides production-ready implementation. Create limiter with `rate.NewLimiter(r, b)` where r is tokens/second and b is burst capacity. Call `Allow()` for immediate decision, `Wait(ctx)` to block until token available. **Critical pattern: per-client rate limiters, not global.** Global limiters allow one aggressive client to starve others. + +```go +type visitor struct { + limiter *rate.Limiter + lastSeen time.Time +} + +var visitors = make(map[string]*visitor) +var mu sync.Mutex + +func getVisitor(ip string) *rate.Limiter { + mu.Lock() + defer mu.Unlock() + + v, exists := visitors[ip] + if !exists { + limiter := rate.NewLimiter(1, 3) // 1/sec, burst 3 + visitors[ip] = &visitor{limiter, time.Now()} + return limiter + } + v.lastSeen = time.Now() + return v.limiter +} + +// Background cleanup prevents memory leak +func cleanupVisitors() { + for { + time.Sleep(time.Minute) + mu.Lock() + for ip, v := range visitors { + if time.Since(v.lastSeen) > 3*time.Minute { + delete(visitors, ip) + } + } + mu.Unlock() + } +} +``` + +**Back pressure detection monitors queue depth and buffer sizes.** JavaScript provides `ws.bufferedAmount` showing bytes buffered but not sent—critical for client-side backpressure detection. Server-side Go monitoring: check send channel length `len(client.send)` against capacity `cap(client.send)`. When utilization exceeds 80%, client is slow—implement strategy: pause sends to that client, drop low-priority messages, or disconnect. **Never let one slow client block broadcasts to other clients.** + +**Back pressure management strategies depend on message priority.** For real-time dashboards, drop old data and keep latest. For chat applications, buffer with limits and use priority queues (high-priority for user messages, low for presence updates). For financial data, never drop—implement strict flow control with acknowledgments. For gaming, drop interpolated position updates but preserve critical game events. For IoT sensors, aggregate/sample data in time windows. + +**Adaptive rate limiting adjusts limits based on system load.** Monitor queue utilization; when utilization exceeds 80%, reduce rate limit by 50%. When utilization drops below 20%, increase rate limit by 20% up to baseline. This provides dynamic backpressure: slow down message acceptance when queues fill, speed up when system has capacity. Prevents resource exhaustion during traffic spikes while maximizing throughput during normal operation. + +```go +type AdaptiveRateLimiter struct { + baseRate rate.Limit + currentRate rate.Limit + limiter *rate.Limiter + queueSize *atomic.Int32 + maxQueueSize int32 +} + +func (arl *AdaptiveRateLimiter) UpdateRate(queueSize int32) { + utilization := float64(queueSize) / float64(arl.maxQueueSize) + + if utilization > 0.8 { + arl.currentRate = arl.currentRate * 0.5 // Reduce 50% + } else if utilization < 0.2 { + arl.currentRate = min(arl.currentRate*1.2, arl.baseRate) // Increase 20% + } + + arl.limiter.SetLimit(arl.currentRate) +} +``` + +**Circuit breakers protect downstream services from cascading failures.** When error rate exceeds threshold (typically 60% over 10+ requests), circuit opens—rejecting requests immediately without attempting operation. After timeout (30-60s), circuit enters half-open state allowing limited probes. Successful probes close circuit; failures reopen. Use `github.com/sony/gobreaker` for production-ready implementation with configurable thresholds, timeouts, and callbacks. + +**Message prioritization uses multiple channels or priority queues.** Three-tier pattern: high, medium, low priority channels. Dequeue from high-priority first, fall back to medium, then low. This ensures critical messages (authentication, control) process before bulk data. Implementation uses nested `select` statements: first select checks only high channel, second adds medium, third adds low and context cancellation. + +## Real-world library architectures and lessons + +**Gorilla WebSocket represents the pragmatic, battle-tested approach with explicit trade-offs.** Architecture: `Upgrader` struct handles HTTP-to-WebSocket handshake, `Conn` struct wraps `net.Conn` with WebSocket-specific operations. Critical design: no internal write synchronization—users must serialize writes via channels or mutexes. Rationale: zero overhead for applications that naturally serialize writes (single writer goroutine pattern). Callbacks for ping/pong/close handlers. PreparedMessage for efficient broadcasts. Comprehensive but requires understanding concurrency patterns. + +**nhooyr/websocket exemplifies modern, idiomatic Go design prioritizing simplicity over micro-optimization.** Every operation accepts `context.Context` for cancellation and timeouts. Internal write synchronization allows concurrent writes from any goroutine at cost of 2KB per connection (acceptable for <100K connections). Clean API: `Read(ctx)`, `Write(ctx, type, data)`, `Close(code, reason)`. Transparent compression negotiation. Zero external dependencies, compiles to Wasm. Philosophy: "there should be only one way to do things." Trade-off: slightly less control for cleaner API. + +**gobwas/ws demonstrates extreme performance optimization for high-scale deployments.** Three-tier API: raw frame reading/writing for zero-copy operation, wsutil package for common patterns, application-level abstractions. Zero-copy HTTP upgrade eliminates 8KB allocations. Netpoll integration (epoll/kqueue) eliminates read goroutine until data available—massive savings at 100K+ connections. Manual state management required. Benchmark: 2.8x faster broadcasts, 10.8x less memory, 158x fewer allocations than naive implementation. Trade-off: steep learning curve, easy to make mistakes, but essential for extreme scale. + +**Socket.io demonstrates feature-rich abstraction at cost of protocol compatibility.** Custom protocol layered on WebSocket with automatic fallback to long-polling. Built-in reconnection, heartbeat, room/namespace abstractions. Event-driven API familiar to JavaScript developers. Architecture: Engine.IO transport layer (WebSocket/polling), Socket.io protocol layer (rooms, namespaces, events), application layer. Success factors: rich features, compatibility, excellent documentation. Trade-offs: custom protocol (not pure WebSocket), performance overhead, vendor lock-in. + +**Python websockets library showcases AsyncIO-native design with coroutine-based patterns.** `async with connect(uri) as websocket:` provides automatic lifecycle management. `async for message in websocket:` enables natural iteration over messages. Three implementations (asyncio, threading, Sans-I/O) for different use cases. Context managers ensure cleanup. Architecture: protocol compliance layer, transport abstraction, application API. Philosophy: Pythonic, correct by construction, robustness over performance. + +**Spring WebSocket illustrates enterprise patterns with heavy framework integration.** STOMP protocol over WebSocket for text-oriented messaging. Destination-based routing (`/topic/*`, `/queue/*`, `/app/*`). Message channel architecture (inbound/outbound/broker channels). Annotation-driven controller methods. Spring Security integration for authentication. Trade-offs: complex configuration, framework dependency, opinionated architecture, but enterprise-ready with extensive integration points. + +## Architectural recommendations for your three-layer stack + +**Layer 1 (roots-ws): pure functional primitives without state or allocations.** Provide frame-level operations: `ReadFrame(io.Reader) (*Frame, error)`, `WriteFrame(io.Writer, Frame) error`. Message encoding/decoding for Nostr JSON: `EncodeEvent(Event) []byte`, `DecodeEvent([]byte) (Event, error)`. No state, no event loops, no goroutines. Design principle: composable building blocks. Pattern from gobwas/ws: expose frame header parsing, payload handling, masking operations as separate functions. Return raw errors without wrapping—let Layer 2 add context. + +**Layer 2 (honeybee): practical implementation with connection lifecycle and essential features.** Build on Layer 1 primitives. Implement ConnectionManager with state machine (Disconnected/Connecting/Connected/Closing/Closed), exponential backoff reconnection (1s initial, 30s max, 2x factor, jitter), ping/pong heartbeat (20-30s intervals), Hub pattern for broadcast coordination. Expose context-aware API: `Read(ctx) (Message, error)`, `Write(ctx, Message) error`, `Close(code, reason) error`. Handle write synchronization internally—users shouldn't worry about goroutine safety. Pattern from nhooyr: simple API, complex internals. + +**Layer 3 (mana-ws): advanced features without duplicating lower layers.** Connection pools for client-side load distribution across relays. PreparedMessage caching for efficient broadcasts (compress once, send to N connections). Adaptive rate limiting (per-client token buckets, adjust based on queue depth). Circuit breakers protecting downstream services. Compression with sophisticated memory management (sync.Pool for flate.Writer, TTL-based caching). Goroutine pools for message processing (avoid per-message goroutine creation). Network policy enforcement (max connections per IP, max message size, idle timeouts). + +**Resource sharing across layers through dependency injection.** Layer 3 components depend on Layer 2 interfaces, not concrete types. Connection pool uses `ConnectionFactory` interface provided by Layer 2. Layer 2 uses `FrameReader`/`FrameWriter` interfaces from Layer 1. Benefits: testability (inject mocks), flexibility (swap implementations), clear boundaries. Pattern: define interfaces where consumed, implement where needed. + +**Configuration cascades from Layer 3 down with sane defaults at each layer.** Layer 3 exposes high-level configuration (max connections per pool, compression level). Layer 2 has defaults for reconnection (1s/30s/2x), heartbeat (20s ping, 20s pong timeout), buffer sizes (256 messages). Layer 1 is configuration-free—purely functional. Users configure only what they need; defaults work for 90% of cases. Advanced users override at appropriate layer. + +**Testing strategy: unit test each layer independently, integration test end-to-end.** Layer 1: pure functions, test frame parsing/generation with table-driven tests. Layer 2: use httptest for connection lifecycle tests, mock Layer 1 for unit testing state machines. Layer 3: test pools/rate limiters with mocks, use k6 for load testing complete stack. All layers: benchmark critical paths, profile for allocations, run with race detector. Mandatory: Autobahn compliance for Layer 1. + +**Error handling strategy: raw errors in Layer 1, wrapped with context in Layer 2, categorized in Layer 3.** Layer 1 returns `io.EOF`, `net.Error`, protocol errors directly. Layer 2 wraps with `errors.Wrap(err, "context")`, classifies retriability, manages state transitions. Layer 3 implements circuit breakers, retry policies, exposes categorized errors to applications. Define sentinel errors at Layer 2 boundary: `ErrConnectionClosed`, `ErrTimeout`, `ErrInvalidState`. Support `errors.Is()` and `errors.As()` for error checking. + +**For Nostr specifically: Layer 2 handles REQ/EVENT/CLOSE protocol, Layer 3 adds subscription management.** Layer 2 implements connection per relay, automatic reconnection, heartbeat (both WebSocket ping/pong and Nostr-level keepalive if needed). Parse Nostr JSON messages, validate signatures in Layer 2. Layer 3 manages subscriptions across multiple relays, deduplicates events, implements filter matching, provides subscription lifecycle (subscribe/unsubscribe/replace). Application layer (your code) uses Layer 3 API focused on Nostr concepts, not WebSocket details. + +## Key implementation priorities + +Start with Layer 2 honeybee using nhooyr/websocket as foundation for clean context-aware API and automatic write synchronization. Implement Hub pattern from Gorilla examples for connection management and broadcasting. Add exponential backoff reconnection (1s/30s/2x/jitter) and heartbeat monitoring (20s intervals). Expose simple API: `Connect(ctx, url)`, `Read(ctx)`, `Write(ctx, msg)`, `Close(code, reason)`. Test with httptest and table-driven tests for reconnection scenarios. + +Then extract Layer 1 roots-ws primitives from Layer 2 implementation. Create zero-allocation frame reading/writing functions, Nostr message encoding/decoding, signature validation helpers. Make them pure functions accepting io.Reader/io.Writer for composability. Test exhaustively with Autobahn suite. Benchmark to ensure zero allocations in hot paths. + +Finally build Layer 3 mana-ws features: PreparedMessage caching for broadcasts (compress once, 1s TTL), per-client rate limiting (golang.org/x/time/rate, 10 msg/sec default, burst 20), connection pooling for client applications, goroutine pools for message processing (2x CPU cores). Add sophisticated features only when needed—premature optimization wastes effort. Profile production workload to identify bottlenecks before optimizing. + +**The most critical insight: memory is the primary scaling constraint, not CPU.** Naive implementation uses 24-32KB per connection (goroutines, buffers, allocations). Optimized approach uses 4KB (netpoll, buffer reuse, on-demand resources). This 6-8x reduction enables serving 6-8x more connections on same hardware. For Nostr relay servers expecting thousands of connections, this makes the difference between scaling and thrashing. Invest in memory optimization: sync.Pool for buffers, PreparedMessage for broadcasts, goroutine pools for processing, netpoll for extreme scale—but only add complexity when measurements prove it necessary. \ No newline at end of file