Pub/Sub Internals
How the Pub/Sub system works internally
This document explains how Ergo Framework's pub/sub system works under the hood. It's written for developers who want to understand the architecture, network behavior, and performance characteristics when building distributed systems.
For basic usage, see Links and Monitors and Events. This document assumes you're familiar with those concepts and focuses on how the system works internally.
The Unified Architecture
Links, monitors, and events look like separate features when you use them. But underneath, they share the same mechanism. Understanding this unification explains why the system behaves consistently and why certain optimizations work.
The Core Concept
Every interaction in the pub/sub system follows one pattern:
A consumer subscribes to a target and receives notifications about that target.
This applies whether you're linking to a process, monitoring a registered name, or subscribing to an event stream. The differences are in what you subscribe to and what notifications you receive.
Three Components of Every Subscription
1. Consumer - The process creating the subscription. This is the process that will receive notifications when something happens to the target.
2. Target - What the consumer subscribes to. Targets come in several types:
PID
gen.PID{Node: "node@host", ID: 100}
A specific process instance
ProcessID
gen.ProcessID{Name: "worker", Node: "node@host"}
A registered name
Alias
gen.Alias{...}
A process alias
Node
gen.Atom("node@host")
A network connection
Event
gen.Event{Name: "prices", Node: "node@host"}
A registered event
3. Subscription Type - How the consumer wants to receive notifications:
Link
Exit signal
MessageExit* → Urgent queue
Terminates by default
Monitor
Down message
MessageDown* → System queue
Continues running
The combination of target type and subscription type determines what message you receive:
Implicit vs Explicit Events
The targets divide into two categories based on what notifications they generate:
Implicit Events - Processes, names, aliases, and nodes generate termination notifications automatically. The target doesn't do anything special - when it terminates (or disconnects, for nodes), the framework generates notifications for all subscribers.
Explicit Events - Registered events generate both published messages AND termination notifications. A producer process explicitly registers an event and publishes messages to it. When the producer terminates or unregisters the event, subscribers also receive termination notification.
The key difference: implicit events give you one notification (termination). Explicit events give you N published messages plus termination notification.
Why Unification Matters
This unified architecture has practical benefits:
Consistent behavior - The same subscription and notification mechanics work for all target types. Once you understand how monitors work for processes, you understand how they work for events.
Shared optimizations - Network optimizations (covered later) apply to all subscription types. Whether you're monitoring 100 remote processes or subscribing 100 consumers to a remote event, the same sharing mechanism kicks in.
Predictable cleanup - Termination cleanup works identically for all subscriptions. When a process terminates, all its subscriptions are cleaned up using the same code path.
How Local Subscriptions Work
When you subscribe to a target on the same node, the operation is simple and fast.
What You Experience
The call returns instantly. There's no network communication, no blocking. The node records your subscription in memory.
What Happens Internally
The Target Manager maintains subscription records. When a process terminates, Target Manager looks up all subscribers and delivers notifications to their mailboxes. For links, notifications go to the Urgent queue. For monitors, they go to the System queue.
Guarantees
Instant subscription - No waiting, no blocking. The subscription is recorded synchronously.
Guaranteed notification - If the target terminates after you subscribe, you will receive notification. The notification mechanism is part of the termination process itself.
Asynchronous delivery - Notifications arrive in your mailbox like any other message. You process them in your HandleMessage callback.
Automatic cleanup - When the target terminates and you receive notification, the subscription is removed automatically. You don't need to unsubscribe.
How Remote Subscriptions Work
Remote subscriptions involve network communication but provide the same guarantees as local subscriptions.
What You Experience
The call blocks while the subscription request travels to the remote node and the response returns. This typically takes milliseconds on a local network.
What Happens Internally
The subscription request travels to the remote node's Target Manager. It validates that the target exists (returning an error if not), records the subscription, and sends confirmation. Once established, termination notifications travel back over the network.
Subscription Validation
Both local and remote subscriptions validate that the target exists:
For events, the target node validates the event is registered:
Guaranteed Notification Delivery
Remote subscriptions guarantee you receive exactly one termination notification. This guarantee holds even when networks fail. Two paths can deliver your notification:
Path 1: Normal Delivery
The target terminates normally. The remote node sends the notification over the network. You receive it with the actual termination reason:
Path 2: Connection Failure
The network connection fails before the notification arrives (or before the target even terminates). Your local node detects the disconnection and generates notifications for all subscriptions to targets on the failed node:
Why This Works
You're guaranteed notification through one of two mechanisms:
Remote node delivers it (normal case)
Local node generates it when detecting connection failure (failover)
The Reason field tells you which path occurred. Your code typically handles both the same way - the target is no longer accessible regardless of why.
This failover mechanism compensates for network unreliability. You write code assuming notifications always arrive, because they do.
Network Optimization: Shared Subscriptions
This section describes the optimization that makes distributed pub/sub practical at scale. Without it, many common patterns would be impractical.
The Problem
Consider a realistic scenario:
Naive implementation: Each MonitorPID call creates a separate network subscription. Result:
100 network round-trips to create subscriptions
100 subscription records on the remote node
100 network messages when the coordinator terminates
This doesn't scale. With 1000 workers, you'd have 1000 network messages just to deliver one termination notification.
What Actually Happens
The framework automatically detects when multiple local processes subscribe to the same remote target and shares the network subscription.
What you observe:
The first subscription to a remote target requires network communication. Every subsequent subscription from the same node to the same target returns instantly - it shares the existing network subscription.
How Notification Delivery Works
When the remote target terminates:
Remote node sends ONE notification message to your node
Your node receives it and looks up all local subscribers to that target
Your node delivers individual notifications to each subscriber's mailbox
Performance Characteristics
First subscription
1 network round-trip
1 network round-trip
N additional subscriptions
N network round-trips
0 (instant)
Notification delivery
N network messages
1 network message
Unsubscribe (not last)
1 network round-trip
0 (instant)
Unsubscribe (last)
1 network round-trip
1 network round-trip
Network cost comparison for 100 subscribers:
Subscribe all
100 round-trips
1 round-trip
Notification
100 messages
1 message
Unsubscribe all
100 round-trips
1 round-trip
Total
300 network operations
3 network operations
Impact on Event Publishing
The same optimization applies to event publishing. When you publish an event with subscribers on multiple nodes:
The framework groups subscribers by node and sends ONE message per node:
What the producer sees:
What subscribers see:
Real-World Scale Example
Consider a market data feed with 1 million subscribers distributed across 10 nodes:
When the producer publishes one price update:
Without optimization
1,000,000
With optimization
10
The optimization transforms O(N) network cost (where N = total subscribers) into O(M) cost (where M = number of nodes). For distributed systems with many subscribers per node, this is the difference between practical and impossible.
Actual benchmark results (from the distributed-pub-sub-1M benchmark):
Why This Matters for System Design
This optimization enables patterns that would be impractical otherwise:
Worker pools monitoring coordinators:
Distributed caching with invalidation:
Hierarchical supervision across nodes:
High-frequency event streaming:
When Sharing Doesn't Apply
The optimization applies when multiple processes on the SAME node subscribe to the SAME remote target.
These share:
These don't share:
Buffered Events: Partial Optimization
Buffered events receive partial optimization. The subscription is shared, but each subscriber must retrieve buffer contents individually.
Why Buffers Complicate Sharing
Event buffers store recent messages for new subscribers:
When a subscriber joins, they receive the buffered messages:
The problem: different subscribers joining at different times need different buffer contents.
If subscriptions were fully shared, all subscribers would receive the same buffer - incorrect for late subscribers.
What Actually Happens
First subscriber: Network round-trip to create subscription AND retrieve buffer.
Subsequent subscribers: Network round-trip to retrieve current buffer (subscription already exists).
Published messages: Still optimized - one network message per node, distributed locally.
Performance Comparison
First subscription
1 network round-trip
1 network round-trip
Additional subscriptions
Instant (shared)
Network round-trip (buffer retrieval)
Published messages
1 message per node
1 message per node
Termination notification
1 message per node
1 message per node
When to Use Buffers
Use buffers when:
New subscribers need recent history (last N configuration updates)
Subscribers might miss messages during brief disconnections
State can be reconstructed from recent messages
Subscriber count is moderate
Avoid buffers when:
Real-time streaming where history isn't useful
High subscriber count across many nodes (each pays network cost)
Messages are only meaningful at publish time
Memory constraints (buffers consume memory on producer node)
Practical guidance:
Producer Notifications
Producers can receive notifications when subscriber interest changes. This enables demand-driven data production.
Enabling Notifications
What You Receive
When Notifications Arrive
0 → 1 (first subscriber)
MessageEventStart
1 → 0 (last subscriber leaves)
MessageEventStop
1 → 2, 2 → 3, etc.
None
3 → 2, 2 → 1, etc.
None
You only receive notifications when crossing the zero threshold. The notifications answer: "is anyone listening?" - not "how many are listening?"
Practical Use Case: On-Demand Data Production
The producer idles when nobody's listening, avoiding unnecessary API calls and resource usage. When subscribers appear, it starts producing. When all subscribers leave, it stops.
Network Transparency
Notifications work across nodes. Remote subscribers count toward "someone is listening":
The producer doesn't know or care whether subscribers are local or remote. The notification mechanism handles it transparently.
Multiple Events
Each event tracks subscribers independently:
Automatic Cleanup
Subscriptions clean up automatically when any participant terminates. This eliminates resource leaks from forgotten subscriptions.
When Target Terminates
All subscribers receive notification. The subscription ceases to exist - there's nothing to unsubscribe from.
When Subscriber Terminates
Your subscriptions are removed from:
Local subscription records
Remote nodes (for remote subscriptions)
If you were the last local subscriber to a remote target, the network subscription is removed. Otherwise, it stays for remaining local subscribers.
When Event Producer Terminates
Subscribers can't distinguish explicit UnregisterEvent from producer termination - both deliver termination notification with reason gen.ErrUnregistered.
When Network Connection Fails
All subscriptions involving the failed node are cleaned up. If the node reconnects later, you need to re-subscribe - the framework doesn't automatically restore subscriptions.
Explicit Unsubscription
You can explicitly remove subscriptions:
Explicit unsubscription is useful when:
You want to stop watching before termination
You're switching to a different target
You're implementing connection retry logic
But in most cases, you don't need explicit unsubscription. Let termination handle cleanup.
Cleanup Order Guarantees
When a process terminates, cleanup happens in a specific order:
Process state changes to Terminated
All outgoing subscriptions (where process is consumer) are removed
All incoming subscriptions (where process is target) generate notifications
Process resources are freed
This ordering ensures:
You don't receive notifications after your process starts terminating
Subscribers to you receive notifications before your resources are freed
No race conditions between notification delivery and cleanup
Summary
Unified architecture
Links, monitors, and events share the same subscription mechanism
Local subscriptions
Instant creation, guaranteed notification, asynchronous delivery
Remote subscriptions
Network round-trip, guaranteed notification via normal or failover path
Shared subscriptions
Multiple local subscribers share one network subscription to remote target
Event publishing
One network message per subscriber node, local fanout to subscribers
Buffered events
Shared delivery, but each subscriber retrieves buffer individually
Producer notifications
MessageEventStart/Stop when crossing zero subscriber threshold
Automatic cleanup
All subscriptions cleaned up on any termination
Key Performance Insights
For subscriptions:
First subscription to remote target: network round-trip
Additional subscriptions to same target: instant
Unbuffered events: full sharing
Buffered events: shared delivery, individual buffer retrieval
For notifications:
One network message per subscriber node
Local distribution to all subscribers on that node
Cost scales with number of nodes, not number of subscribers
For cleanup:
Automatic on any termination
No resource leaks possible
No manual unsubscription required
Last updated
