Pub/Sub Internals

How the Pub/Sub system works internally

This document explains how Ergo Framework's pub/sub system works under the hood. It's written for developers who want to understand the architecture, network behavior, and performance characteristics when building distributed systems.

For basic usage, see Links and Monitors and Events. This document assumes you're familiar with those concepts and focuses on how the system works internally.

The Unified Architecture

Links, monitors, and events look like separate features when you use them. But underneath, they share the same mechanism. Understanding this unification explains why the system behaves consistently and why certain optimizations work.

The Core Concept

Every interaction in the pub/sub system follows one pattern:

A consumer subscribes to a target and receives notifications about that target.

This applies whether you're linking to a process, monitoring a registered name, or subscribing to an event stream. The differences are in what you subscribe to and what notifications you receive.

Three Components of Every Subscription

1. Consumer - The process creating the subscription. This is the process that will receive notifications when something happens to the target.

2. Target - What the consumer subscribes to. Targets come in several types:

Target Type
Example
What It Represents

PID

gen.PID{Node: "node@host", ID: 100}

A specific process instance

ProcessID

gen.ProcessID{Name: "worker", Node: "node@host"}

A registered name

Alias

gen.Alias{...}

A process alias

Node

gen.Atom("node@host")

A network connection

Event

gen.Event{Name: "prices", Node: "node@host"}

A registered event

3. Subscription Type - How the consumer wants to receive notifications:

Type
Creates
Notification
Effect on Consumer

Link

Exit signal

MessageExit* → Urgent queue

Terminates by default

Monitor

Down message

MessageDown* → System queue

Continues running

The combination of target type and subscription type determines what message you receive:

Implicit vs Explicit Events

The targets divide into two categories based on what notifications they generate:

Implicit Events - Processes, names, aliases, and nodes generate termination notifications automatically. The target doesn't do anything special - when it terminates (or disconnects, for nodes), the framework generates notifications for all subscribers.

Explicit Events - Registered events generate both published messages AND termination notifications. A producer process explicitly registers an event and publishes messages to it. When the producer terminates or unregisters the event, subscribers also receive termination notification.

The key difference: implicit events give you one notification (termination). Explicit events give you N published messages plus termination notification.

Why Unification Matters

This unified architecture has practical benefits:

Consistent behavior - The same subscription and notification mechanics work for all target types. Once you understand how monitors work for processes, you understand how they work for events.

Shared optimizations - Network optimizations (covered later) apply to all subscription types. Whether you're monitoring 100 remote processes or subscribing 100 consumers to a remote event, the same sharing mechanism kicks in.

Predictable cleanup - Termination cleanup works identically for all subscriptions. When a process terminates, all its subscriptions are cleaned up using the same code path.

How Local Subscriptions Work

When you subscribe to a target on the same node, the operation is simple and fast.

What You Experience

The call returns instantly. There's no network communication, no blocking. The node records your subscription in memory.

What Happens Internally

The Target Manager maintains subscription records. When a process terminates, Target Manager looks up all subscribers and delivers notifications to their mailboxes. For links, notifications go to the Urgent queue. For monitors, they go to the System queue.

Guarantees

Instant subscription - No waiting, no blocking. The subscription is recorded synchronously.

Guaranteed notification - If the target terminates after you subscribe, you will receive notification. The notification mechanism is part of the termination process itself.

Asynchronous delivery - Notifications arrive in your mailbox like any other message. You process them in your HandleMessage callback.

Automatic cleanup - When the target terminates and you receive notification, the subscription is removed automatically. You don't need to unsubscribe.

How Remote Subscriptions Work

Remote subscriptions involve network communication but provide the same guarantees as local subscriptions.

What You Experience

The call blocks while the subscription request travels to the remote node and the response returns. This typically takes milliseconds on a local network.

What Happens Internally

The subscription request travels to the remote node's Target Manager. It validates that the target exists (returning an error if not), records the subscription, and sends confirmation. Once established, termination notifications travel back over the network.

Subscription Validation

Both local and remote subscriptions validate that the target exists:

For events, the target node validates the event is registered:

Guaranteed Notification Delivery

Remote subscriptions guarantee you receive exactly one termination notification. This guarantee holds even when networks fail. Two paths can deliver your notification:

Path 1: Normal Delivery

The target terminates normally. The remote node sends the notification over the network. You receive it with the actual termination reason:

Path 2: Connection Failure

The network connection fails before the notification arrives (or before the target even terminates). Your local node detects the disconnection and generates notifications for all subscriptions to targets on the failed node:

Why This Works

You're guaranteed notification through one of two mechanisms:

  1. Remote node delivers it (normal case)

  2. Local node generates it when detecting connection failure (failover)

The Reason field tells you which path occurred. Your code typically handles both the same way - the target is no longer accessible regardless of why.

This failover mechanism compensates for network unreliability. You write code assuming notifications always arrive, because they do.

Network Optimization: Shared Subscriptions

This section describes the optimization that makes distributed pub/sub practical at scale. Without it, many common patterns would be impractical.

The Problem

Consider a realistic scenario:

Naive implementation: Each MonitorPID call creates a separate network subscription. Result:

  • 100 network round-trips to create subscriptions

  • 100 subscription records on the remote node

  • 100 network messages when the coordinator terminates

This doesn't scale. With 1000 workers, you'd have 1000 network messages just to deliver one termination notification.

What Actually Happens

The framework automatically detects when multiple local processes subscribe to the same remote target and shares the network subscription.

What you observe:

The first subscription to a remote target requires network communication. Every subsequent subscription from the same node to the same target returns instantly - it shares the existing network subscription.

How Notification Delivery Works

When the remote target terminates:

  1. Remote node sends ONE notification message to your node

  2. Your node receives it and looks up all local subscribers to that target

  3. Your node delivers individual notifications to each subscriber's mailbox

Performance Characteristics

Operation
Without Sharing
With Sharing

First subscription

1 network round-trip

1 network round-trip

N additional subscriptions

N network round-trips

0 (instant)

Notification delivery

N network messages

1 network message

Unsubscribe (not last)

1 network round-trip

0 (instant)

Unsubscribe (last)

1 network round-trip

1 network round-trip

Network cost comparison for 100 subscribers:

Phase
Without Sharing
With Sharing

Subscribe all

100 round-trips

1 round-trip

Notification

100 messages

1 message

Unsubscribe all

100 round-trips

1 round-trip

Total

300 network operations

3 network operations

Impact on Event Publishing

The same optimization applies to event publishing. When you publish an event with subscribers on multiple nodes:

The framework groups subscribers by node and sends ONE message per node:

What the producer sees:

What subscribers see:

Real-World Scale Example

Consider a market data feed with 1 million subscribers distributed across 10 nodes:

When the producer publishes one price update:

Approach
Network Messages

Without optimization

1,000,000

With optimization

10

The optimization transforms O(N) network cost (where N = total subscribers) into O(M) cost (where M = number of nodes). For distributed systems with many subscribers per node, this is the difference between practical and impossible.

Actual benchmark results (from the distributed-pub-sub-1M benchmark):

Why This Matters for System Design

This optimization enables patterns that would be impractical otherwise:

Worker pools monitoring coordinators:

Distributed caching with invalidation:

Hierarchical supervision across nodes:

High-frequency event streaming:

When Sharing Doesn't Apply

The optimization applies when multiple processes on the SAME node subscribe to the SAME remote target.

These share:

These don't share:

Buffered Events: Partial Optimization

Buffered events receive partial optimization. The subscription is shared, but each subscriber must retrieve buffer contents individually.

Why Buffers Complicate Sharing

Event buffers store recent messages for new subscribers:

When a subscriber joins, they receive the buffered messages:

The problem: different subscribers joining at different times need different buffer contents.

If subscriptions were fully shared, all subscribers would receive the same buffer - incorrect for late subscribers.

What Actually Happens

First subscriber: Network round-trip to create subscription AND retrieve buffer.

Subsequent subscribers: Network round-trip to retrieve current buffer (subscription already exists).

Published messages: Still optimized - one network message per node, distributed locally.

Performance Comparison

Aspect
Unbuffered Event
Buffered Event

First subscription

1 network round-trip

1 network round-trip

Additional subscriptions

Instant (shared)

Network round-trip (buffer retrieval)

Published messages

1 message per node

1 message per node

Termination notification

1 message per node

1 message per node

When to Use Buffers

Use buffers when:

  • New subscribers need recent history (last N configuration updates)

  • Subscribers might miss messages during brief disconnections

  • State can be reconstructed from recent messages

  • Subscriber count is moderate

Avoid buffers when:

  • Real-time streaming where history isn't useful

  • High subscriber count across many nodes (each pays network cost)

  • Messages are only meaningful at publish time

  • Memory constraints (buffers consume memory on producer node)

Practical guidance:

Producer Notifications

Producers can receive notifications when subscriber interest changes. This enables demand-driven data production.

Enabling Notifications

What You Receive

When Notifications Arrive

Transition
Notification

0 → 1 (first subscriber)

MessageEventStart

1 → 0 (last subscriber leaves)

MessageEventStop

1 → 2, 2 → 3, etc.

None

3 → 2, 2 → 1, etc.

None

You only receive notifications when crossing the zero threshold. The notifications answer: "is anyone listening?" - not "how many are listening?"

Practical Use Case: On-Demand Data Production

The producer idles when nobody's listening, avoiding unnecessary API calls and resource usage. When subscribers appear, it starts producing. When all subscribers leave, it stops.

Network Transparency

Notifications work across nodes. Remote subscribers count toward "someone is listening":

The producer doesn't know or care whether subscribers are local or remote. The notification mechanism handles it transparently.

Multiple Events

Each event tracks subscribers independently:

Automatic Cleanup

Subscriptions clean up automatically when any participant terminates. This eliminates resource leaks from forgotten subscriptions.

When Target Terminates

All subscribers receive notification. The subscription ceases to exist - there's nothing to unsubscribe from.

When Subscriber Terminates

Your subscriptions are removed from:

  • Local subscription records

  • Remote nodes (for remote subscriptions)

If you were the last local subscriber to a remote target, the network subscription is removed. Otherwise, it stays for remaining local subscribers.

When Event Producer Terminates

Subscribers can't distinguish explicit UnregisterEvent from producer termination - both deliver termination notification with reason gen.ErrUnregistered.

When Network Connection Fails

All subscriptions involving the failed node are cleaned up. If the node reconnects later, you need to re-subscribe - the framework doesn't automatically restore subscriptions.

Explicit Unsubscription

You can explicitly remove subscriptions:

Explicit unsubscription is useful when:

  • You want to stop watching before termination

  • You're switching to a different target

  • You're implementing connection retry logic

But in most cases, you don't need explicit unsubscription. Let termination handle cleanup.

Cleanup Order Guarantees

When a process terminates, cleanup happens in a specific order:

  1. Process state changes to Terminated

  2. All outgoing subscriptions (where process is consumer) are removed

  3. All incoming subscriptions (where process is target) generate notifications

  4. Process resources are freed

This ordering ensures:

  • You don't receive notifications after your process starts terminating

  • Subscribers to you receive notifications before your resources are freed

  • No race conditions between notification delivery and cleanup

Summary

Concept
How It Works

Unified architecture

Links, monitors, and events share the same subscription mechanism

Local subscriptions

Instant creation, guaranteed notification, asynchronous delivery

Remote subscriptions

Network round-trip, guaranteed notification via normal or failover path

Shared subscriptions

Multiple local subscribers share one network subscription to remote target

Event publishing

One network message per subscriber node, local fanout to subscribers

Buffered events

Shared delivery, but each subscriber retrieves buffer individually

Producer notifications

MessageEventStart/Stop when crossing zero subscriber threshold

Automatic cleanup

All subscriptions cleaned up on any termination

Key Performance Insights

For subscriptions:

  • First subscription to remote target: network round-trip

  • Additional subscriptions to same target: instant

  • Unbuffered events: full sharing

  • Buffered events: shared delivery, individual buffer retrieval

For notifications:

  • One network message per subscriber node

  • Local distribution to all subscribers on that node

  • Cost scales with number of nodes, not number of subscribers

For cleanup:

  • Automatic on any termination

  • No resource leaks possible

  • No manual unsubscription required

Last updated