Building an Enterprise Anycast CDN at the Network Edge

This series is a theory — my theory. It is not presented as a standard, a prescription, or a finished product, but as a deliberate exploration of an idea that emerges from operating large networks over time. Some parts are well‑understood practices; others are hypotheses tested through reasoning, experience, and constraint. Like any good theory, it is meant to be examined, challenged, adapted, and occasionally rejected. What follows is an attempt to think clearly and honestly about what might be possible, not to declare what must be done.

Section 5 — Advertising Service Truth

Section 5 — Advertising Service Truth (and Why Withdrawal Matters More Than Selection)

Up to this point, the architecture has focused on where traffic enters and how decisions are made after it arrives. What remains is the question of what information those decisions are based on.

In many systems, this is where complexity creeps in. Engineers attempt to build increasingly clever selection logic: choosing the "best" site, the "fastest" backend, or the "least loaded" cache. While these approaches can work in tightly controlled environments, they tend to fail poorly when distributed across many sites.

This design takes a different approach. Rather than trying to be clever about selection, it focuses on being precise about truth.

Services Advertise Themselves

In this model, services are responsible for advertising their own availability into the overlay routing domain.

If a service is healthy at a site, that site advertises a specific service identity — typically a /32 — into the overlay. If the service becomes unhealthy, the advertisement is withdrawn.

There is no central controller inferring state. There is no external system making guesses.

The closest component to the service declares the truth.

Why Withdrawal Is More Important Than Selection

It is tempting to think that the hard problem is choosing the best destination. In practice, the far more important problem is knowing when a destination should not be used at all.

Explicit withdrawal has several advantages:

It is unambiguous
It converges quickly
It avoids partial or stale state
It aligns naturally with routing behavior

When a service withdraws its advertisement, it is simply no longer a candidate. No special logic is required elsewhere.

By contrast, attempting to rank or score destinations requires shared assumptions, synchronized metrics, and careful tuning — all of which become brittle at scale.

Partial Failures Become First-Class

Because service reachability is signaled explicitly, partial failures are handled cleanly.

For example:

A node may remain reachable via anycast
Other services at that node may remain healthy
Only one specific service is withdrawn

Traffic for that service will naturally flow elsewhere, without disturbing unrelated traffic.

This is difficult to achieve when correctness is inferred indirectly.

What the Overlay Sees

From the overlay's perspective, there is no concept of "load" or "preference." There are only routes that exist and routes that do not.

If multiple sites advertise the same service identity:

All are considered valid
The overlay may choose among them based on topology or cost

If no site advertises the service:

The service is unavailable
That fact is explicit and visible

The overlay does not guess.

Aligning Routing With Reality

Routing works best when it reflects reality rather than attempting to predict it.

By reducing service state to a simple binary signal — present or absent — the system avoids many subtle failure modes:

Stale health information
Split-brain decisions
Oscillation based on marginal metrics

This simplicity is intentional.

In the next section, we will look at how private transport fits into this picture — and how it can be used to improve performance without becoming a dependency or a source of implicit trust.

Workshop