Most teams start a navigation migration the same way: someone says "we'll adopt the Coordinator pattern," and the work begins. I think that's the trap. It conflates the two concerns the whole design exists to keep apart. A Coordinator is a navigation-execution pattern. The actual problem you're usually trying to solve — turning untrusted deep links and scattered dispatch into something safe and consistent — is a resolution problem. Those are different layers, and if you build the executor before you've defined what a resolved route even is, you've built the consumer before the thing it consumes.
This essay is a build-up from that first wrong step to a full production deep-link and navigation architecture for iOS. It's not a pattern catalogue. It's a derivation: each layer justified by a force that's actually present, with the second-order costs named honestly. The spine running through all of it is one sentence:
The engine turns untrusted input into a typed decision; coordinators carry that decision out as side effects. This is effects-as-data applied to navigation — the same shape as a server-driven UI core that emits typed actions and never opens a socket. Pure core, typed decision, side effect at the boundary. Hold that line.
The coordinator-first trap
The instinct to start with a Coordinator is wrong for three reasons.
It's out of scope. "Navigation is execution, not resolution" is the boundary the design is built to respect. Adopting coordinators app-wide is a separate, much larger undertaking; it isn't what the routing problem needs.
It couples you to UIKit on day one. A Coordinator drags UINavigationController into your core immediately, which kills the one property you most want: that resolution be unit-testable headless, with no simulator.
And it's backwards. You can't design "how to execute a route" before you've defined what a resolved route is. The decision is the input to execution; build it first.
The real first step is the resolution contract, pure and headless: RouteResolution, Route, RouteError, and an engine.resolve(url:context:) that does match → extract → validate over an in-memory registry, with unit tests. Then a thin downstream handler executes it. The coordinator is a consumer you add when you wire the first entry point — not the opening move.
Two orthogonal layers
"Is a Coordinator the best design for these problems?" is the wrong question, because "these problems" are two unrelated problems. If you list out the smells a navigation mess actually has, they fall cleanly into two buckets:
| Class | Smells | Owned by |
|---|---|---|
| Resolution | dispatch scattered across handlers · inconsistent vocabulary (product/p/deeplink) · no validation (force-subscript crash on a bad URL) · inconsistent auth · per-route param extraction by hand · no metrics · "add a route = ship a release" | Routing Engine |
| Execution / flow | view controllers construct and push each other directly · diffuse nav-stack ownership | Coordinator |
A Coordinator solves the execution class only. You can build a textbook coordinator and still have every resolution smell intact: three entry points still hand-roll their own parsing, the malformed URL still crashes, the vocabulary still diverges, you still have zero metrics. The coordinator just relocates the if/else from a scene delegate into a coordinator. It doesn't validate anything.
RouteResolution is to navigation what a typed ButtonAction is to a server-driven UI runtime. The engine emits a decision-as-data and does not call pushViewController, exactly as an SDUI core emits actions and does not open sockets. The same boundary, drawn twice in the same system.What a coordinator actually buys
Once you've separated the layers, the coordinator earns a real win. Construction becomes trivial because the view controllers stop knowing about each other.
Before, HomeViewController imports, constructs, and pushes Category, Product, Promo, Cart. With a coordinator, a VC emits an intent ("user tapped product p1004") and the coordinator decides what to build. The coordinator becomes the composition root for the flow — the one place that constructs VCs and injects their dependencies. That's also why deep-link routing gets clean: a tap and a deep link both funnel into the same coordinator.handle(...).
But one phrase needs sharpening. People say "the coordinator keeps track of what's being shown." Careful. The UINavigationController already owns the stack — its viewControllers array is the source of truth. The coordinator owns the flow and a tree of child coordinators, never a parallel mirror of the VC stack. The moment you keep your own mutable copy of "what's shown," you've built a desync bug waiting for the one interaction you didn't model.
The second-order cost
This is why coordinators are harder than they look. The win is real, but it comes with a lifecycle you now manage by hand.
- Lifecycle and retain are the real tax. A parent holds its children in
var children: [Coordinator]— it must, or they dealloc mid-flow. When a flow finishes, something has to tell the parent "remove me," or the whole child graph leaks. It's the same chicken-and-egg as a timer retain cycle: the thing that would clean you up is the thing holding you alive. - The system back button is the assassin. Tap
<or swipe-to-pop and UIKit pops the VC and never tells your coordinator. Your flow state silently desyncs from what's on screen. - Construction got trivial, but DI didn't vanish — it relocated into the coordinator. That's good (one place) right up until the coordinator absorbs every flow and becomes a Massive Coordinator: the same single-responsibility failure as a Massive View Controller, one layer up.
Three presentation hierarchies
Here's the part most coordinator tutorials skip, and it's the reason the coordinator tree exists at all. UINavigationController.viewControllers is the source of truth only for pushes. UIKit actually has three independent presentation containers, each with its own teardown signal, and no single UIKit object spans all three.
| Presentation | Owner / truth | Teardown signal |
|---|---|---|
| Push | UINavigationController.viewControllers (strong) | pop / back / swipe |
| Modal | presentingVC.presentedViewController chain | dismiss (not a pop) |
| Tabs | UITabBarController — N independent nav stacks | tab switch (no teardown — both stay alive) |
The coordinator tree is the unifying abstraction precisely because UIKit fragments presentation into three containers with three different "it's gone now" signals.
push. The root has to select the Orders tab, tell that tab's coordinator to pop to root, then push OrderDetail. That's cross-container orchestration no single pushViewController can express. The tree is what makes it expressible.Ownership and pop detection
So the lifecycle problem is concrete: who holds what, and how do you find out when something on screen is gone? Fix the premise first. The coordinator should hold child coordinators strongly and view controllers weakly, or not at all. The nav controller already retains the live VC; a second strong owner is a desync waiting to happen.
So the leak isn't "coordinator → VC." It's this: the parent's children array still holds a child coordinator — and its view model, its services — after that child's VC was popped. Nothing auto-removes it.
The fix is to stop tracking the stack yourself and let UIKit tell you. There are three teardown signals, in order of robustness.
A — become the UINavigationControllerDelegate. Its didShow fires on every push and pop, including swipe and back button:
func navigationController(_ nav: UINavigationController, didShow vc: UIViewController, animated: Bool) {
guard let fromVC = nav.transitionCoordinator?.viewController(forKey: .from),
!nav.viewControllers.contains(fromVC) else { return } // still present ⇒ PUSH, ignore
// fromVC no longer in the stack ⇒ it was POPPED → tear down its child coordinator
childDidPop(fromVC)
}
The !nav.viewControllers.contains(fromVC) test is the whole trick: on a push, fromVC is still in the array; on a pop, it's gone. One line distinguishes the two.
B — a base-VC subclass that fires on isMovingFromParent or deinit. It works, but it couples the VC to coordinator-cleanup semantics, misses modal dismiss, and deinit timing is fragile.
C — an explicit finish callback for flows the user completes rather than backs out of: checkout succeeds → onFinish?() → coordinator dismisses and removes the child.
contains check sees the VC still in the stack and correctly does nothing. With a naive viewWillDisappear in a base VC, you false-positive and tear down a flow the user didn't actually leave. One edge case to bank: transitionCoordinator?.viewController(forKey: .from) is nil for non-animated transitions, so those you clean up explicitly via C.You need both directions: A (or B) for backing out, C for completing forward. Either way the removal is always event-driven, never automatic. You don't track the stack — you reconcile children against viewControllers when UIKit tells you something moved.
Should the view model hold the coordinator?
This is the question everyone asks, and "weak or strong?" is the wrong axis. The view model should not reference the coordinator at all. The real question is how navigation intent gets from the VM up to the coordinator. The answer is output-binding, not a back-reference.
final class ProductListViewModel {
var onSelectProduct: ((String) -> Void)? // an OUTPUT. VM has no idea what happens next.
}
// coordinator, at construction:
let vm = ProductListViewModel()
vm.onSelectProduct = { [weak self] id in self?.showProduct(id) } // coordinator decides the verb
Four reasons to take the output, not the reference:
- Dependency direction. Coordinator → VM → events back out. One-way arrow, no back-edge. The VM never names the coordinator type.
- Intent, not navigation — effects-as-data again. "selected product p1004" is intent; "push ProductDetail animated" is execution. The VM emits the action as data; the coordinator owns the side effect. The same boundary as the pure core.
- Testability.
vm.onSelectProductis trivially asserted. A VM holding a coordinator forces you to define a coordinator protocol and a mock just to test selection. - Reusability. Product detail reached from search vs home vs a deep link wants different next steps. With an output, each coordinator interprets the same intent differently.
((String) -> Void)? and weakly captures itself inside the closure. The VM stores a closure, not a coordinator. No cycle, nothing to manage.A direct weak var coordinator on the VM isn't a sin — it's a common small-app shortcut, and if you take it, weak is mandatory (strong is a leak). Its cost is the coupling and test friction above. For maximum purity, keep the VM ignorant of navigation entirely and let the VC forward navigation-relevant events to the coordinator. Either way, the arrow never points VM → coordinator.
Should the view controller hold the coordinator?
Yes — via a weak, flow-specific delegate. This is the idiomatic, leak-safe answer, with three sharpenings.
It must be weak. The coordinator owns the VC, so a strong VC → coordinator closes a cycle. weak var delegate breaks it — the same reason UITableView doesn't own its delegate.
Type it to a flow-specific protocol, not to Coordinator.
protocol ProductListDelegate: AnyObject { // AnyObject ⇒ weak-able
func productList(_ vc: ProductListViewController, didSelect id: String)
}
final class ProductListViewController: UIViewController {
weak var delegate: ProductListDelegate?
}
extension HomeTabCoordinator: ProductListDelegate {
func productList(_ vc: ProductListViewController, didSelect id: String) { showProduct(id) }
}
If every VC's delegate is one god CoordinatorDelegate with twelve methods, every VC sees every navigation action and you've rebuilt the coupling you came to kill. One narrow protocol per VC, defined from the VC's point of view.
The VC reports intent, never the destination. didSelect id is intent. The coordinator decides push vs modal vs switch-tab. The VC's only job beyond the VM is translating a UIKit event into a delegate call — the legitimate UIKit bridge.
Pick delegate vs closure by cardinality: one or two outputs → closures (lighter, and the closure captures self so there's no stored reference); three or more cohesive outputs (didSelect, didPullToRefresh, didTapCart) → a delegate protocol that groups them into one conformance. The coordinator sets vc.delegate = self at construction, because it is the composition root:
func showProductList() {
let vm = ProductListViewModel(service: catalog) // construct VM, inject deps
let vc = ProductListViewController(viewModel: vm) // inject VM
vc.delegate = self // wire back-channel (weak)
navigationController.pushViewController(vc, animated: true) // execute
}
coordinator → child (strong) · nav → VC (strong) · VC → VM (strong) · VC ⇢ coordinator (weak, the delegate). The only back-edge is weak. Ownership story closed.Granularity: one coordinator per flow
Per flow — not per VC, not per app. A flow is a cohesive sequence of screens that accomplishes one user goal, shares a lifetime and context, and has a clean entry and a clean finish. Within a flow, the linear pushes (Home → Category → Product → Reviews) are all handled by the one flow coordinator.
| Failure mode | What goes wrong |
|---|---|
| per VC | Ceremony explosion. Every screen gets a coordinator, a delegate, wiring, and a children slot — for screens whose entire job is "push the next one." childDidFinish fires on every pop, so the abstraction is pure noise. |
| per app | Massive Coordinator. Every flow's nav logic in one file — the same SRP failure as Massive VC, one layer up. |
Split a flow into a child coordinator when any one of these holds: it's reusable from more than one parent (Auth from Checkout and Profile), it's independently presentable (a modal you push as a unit), it has a completion result to hand back ("auth finished → token"), or the parent is going Massive and you want to extract a cohesive sub-flow.
childDidFinish answers "did the user complete or abandon this goal?" — meaningful for checkout, meaningless for "ProductDetail was popped." It's also where a child hands a result up.authCoordinator.onFinish = { [weak self] result in
self?.childDidFinish(authCoordinator)
if case .authenticated = result { self?.continueToCart() } // resume the gated intent
}
Auth is the textbook child coordinator. A qcommerce://cart link while logged out → present AuthCoordinator modally → on .authenticated, remove the child and continue to Cart. One example exercises flow granularity, a modal sub-flow, a precondition gate, and a completion result resuming a deferred intent. We'll come back to that resume.
The AppCoordinator: between flows, not within them
Orchestrating between flows is its defining job. The clean division: the AppCoordinator (root) decides which flow runs and handles every transition across flow boundaries; the flow coordinators handle the pushes inside browsing, checkout, auth. Keep that line sharp and the root stays small. The moment AppCoordinator knows about ProductDetailViewController, flow-internal knowledge has leaked upward, and you have a Massive Coordinator.
final class AppCoordinator: Coordinator {
var children: [Coordinator] = []
func start() // pick initial flow, set window root
func handle(_ resolution: RouteResolution) // the three legacy handlers collapse into THIS funnel
private func route(_ resolution: RouteResolution)
private func presentAuth(thenResume request: RouteRequest)
private func didLogin() // swap Auth flow → Main flow
private func logout() // tear down Main → show Auth
}
The crux is two-level dispatch. The root maps DestinationID → which flow — the only routing knowledge at this level — and hands the resolution down. The flow coordinator maps it to concrete VCs.
private func route(_ r: RouteResolution) {
switch r.destination { // root maps DestinationID → FLOW…
case .home, .category, .product: browseFlow.show(r) // …then delegates INTO the flow
case .order: ordersFlow.show(r) // flow maps to its own screens
case .promo: promoFlow.show(r)
}
}
The orchestration that has no other home is exactly what justifies the root: the precondition gate with its deferred intent (stash, present auth, replay); flow swaps (login, logout, session expiry); cross-flow handoff (checkout done → route to order status). None of those belong to any single flow coordinator. They're transitions between flows.
The engine boundary: regular flows do not go through the engine
It's tempting to give AppCoordinator an engine property and call resolve inside it. That blurs the boundary. The engine is a standalone module at the entry-point edge; the coordinator consumes a RouteResolution and does not embed the engine.
And more strongly: regular in-app flows do not touch the engine at all. The engine exists to turn untrusted, stringly-typed external input into a typed, validated decision. An in-app tap is already typed — you're holding a Product with id = "p1004" from a model you fetched. Routing that through URL resolution is ceremony that destroys type safety, invents failure modes that can't exist in a typed call, and couples every tap to the route table.
Both paths converge on the coordinator's typed flow methods. The engine is only on the external arm — an adapter that lifts untrusted input up to the same typed vocabulary the in-app path already speaks. That's exactly what "navigation is execution, owned outside the engine" means in practice.
Policy vs mechanism: the engine evaluates the gate, the coordinator can't enforce it
The engine must do validation. It should also do the auth check — the precondition policy belongs in the engine, not the coordinator. The coordinator re-deciding "does cart need auth, is the user logged in" is the duplication that becomes the inconsistent-auth smell over time.
But there's a refinement: policy versus mechanism. The engine evaluates the precondition (given a RouteContext carrying isLoggedIn); it cannot enforce it, because enforcement is a side effect — presenting login — and the engine is pure and headless. So it emits the failure as data:
.proceed(.cart) // logged in → go
.gated(.requiresAuth, resume: req) // not logged in → the verdict, as a value
The coordinator then "blindly translates" — and blind means it carries out the verdict without re-deciding it, not that it's logic-free. .gated still means the coordinator presents auth, stashes the intent, and resumes. Policy → engine; mechanism → coordinator.
engine.resolve(url: URL, context:) // external: parse + validate + evaluate
engine.resolve(request: RouteRequest, context:) // in-app: NO string round-trip; just evaluate
Both run the same precondition evaluation against the same context. One policy authority, no duplication.
resolve a context with isLoggedIn: false, assert it emits .gated — no simulator, no UIKit. Injecting session state doesn't make the engine impure; it makes the security policy a testable value.Gates are results, not errors
A natural instinct is to have the engine throw when the user isn't logged in, carrying the post-login flow in the error. Adopt half of that and reject half.
Adopt: the gate carries its own continuation — "here's what you were trying to do; resume it after login" — as data. That's strictly better than a separate pendingResolution field on the coordinator, because the resume target has clear provenance (it came from the resolver) and the coordinator holds no mutable routing state.
Reject: the throw. A logged-out user hitting a gated link is not an error. The route resolved perfectly; there's simply a prerequisite. The throw channel should mean "I could not resolve this" — malformed URL, unknown route, validation failed — cases with no destination at all. A gate is the opposite: a fully successful resolution that happens to have a precondition.
enum RouteResolution {
case proceed(Destination) // go now
case gated(precondition: Precondition,
resume: RouteRequest) // valid + recoverable: satisfy, then resume
}
enum RouteError: Error { case malformed, unknownRoute, validationFailed } // genuinely could not resolve
If you put "needs login" in the throws channel, every catch site has to disambiguate "the input is broken" from "do this one thing first" — opposite handling, a dead end versus a continuation. The usual "throw forces the caller to handle it" argument is weak in Swift: a switch over RouteResolution is already exhaustive. You get the forcing function without abusing the error channel.
RouteRequest) and re-resolve after the gate clears — don't cache a pre-computed destination. The user is logging in for several seconds; the world can change. Re-running resolve(request, freshContext) means a chained precondition surfaces naturally (auth → now needs a payment method → another .gated), and stale state can't leak through (cart emptied, item out of stock).There's a defensible alternative: a house style where "all navigation outcomes flow through typed throws" is internally consistent. Just know you've chosen ergonomic uniformity over semantic precision, and be ready to defend "is a logout really an error?"
The re-entrant dispatcher
So the gate fires, login happens, and LoginCoordinator reports success. How does the original intent get back to the engine? The wrong answer is to reach back into the engine from inside the coordinator tree. The right answer is to bubble the signal up to whatever owns the engine — a thin RouteDispatcher that owns the engine and the root coordinator and is the loop connecting them. That keeps the engine out of the coordinator.
final class RouteDispatcher {
private let engine: RoutingEngine
private let coordinator: AppCoordinator
private func context() -> RouteContext { /* fresh: reads session NOW */ }
func dispatch(_ request: RouteRequest) { // ← single re-entrant funnel
switch engine.resolve(request: request, context: context()) {
case .proceed(let dest):
coordinator.execute(dest)
case .gated(let precondition, let resume):
coordinator.satisfy(precondition) { [weak self] satisfied in
guard satisfied else { return } // cancelled → abandon, don't resume
self?.dispatch(resume) // ← re-resolve, fresh context
}
// RouteError → metric + fallback
}
}
}
Three things about how the intent travels. The resume intent rides in the completion closure the dispatcher hands down — no stored property, it lives exactly as long as the auth flow, captured in the closure's environment. The LoginCoordinator never sees the intent; its entire contract is "authenticate, report success or cancel," which is what makes it reusable from cart, profile, and checkout. And when the closure fires dispatch(resume), the second pass builds a fresh context — the session was mutated during login, so this time the engine returns .proceed instead of .gated.
onFinish(.success) fires before the token is persisted, context() still reads logged-out → .gated again → present login again. Order is load-bearing. (2) Cancellation must not resume — completion(false) drops the intent; don't smuggle the user somewhere they bailed on. (3) Chained gates are free — the re-dispatch re-enters the same switch, so preconditions compose without special-casing.A small app can skip the dispatcher and let AppCoordinator hold the engine and re-resolve itself — fewer types — but you've then put the engine inside the coordinator. The dispatcher is the price of keeping that boundary clean.
Per-screen routers: the false binary
"Central god orchestrator versus per-screen router" is a false binary. Both production lineages exist: VIPER's Router (per-module) and Uber's RIBs (per-node router with attach/detach). And the strong part of the per-screen instinct is genuinely good: definitive inputs and outputs per screen is excellent contract design — it's "emit typed intent, let the owner decide," one level up.
What per-screen routing genuinely wins: locality (a screen's exits live with the screen), modular ownership and merge isolation (the big one at scale — a central coordinator is a file every team edits, a merge-conflict magnet; per-screen routers make a router's outputs a module's public contract), a testable contract (given inputs, assert the emitted exit), and no god object.
But there's a decisive cost.
output → next screen. There are only two options: the parent wires it (you've reinvented a hierarchy), or the screen's router builds the next screen (it must import sibling modules, and the modularity win evaporates). You cannot escape this — full distribution either recouples or recentralizes. RIBs answers with the tree: each parent wires only its direct children.There are also concerns with no home at the leaf: cross-cutting flows (the gate-resume) are owned by no single screen; deep-link to arbitrary depth needs a root that knows the whole map to assemble [Home → Orders → OrderDetail]; and security gating distributed to N routers is the inconsistent-auth smell again — one authority is what fixed it.
The synthesis is hierarchical routers: a thin root for cross-cutting, leaves owning their local exits.
Who owns the flow? SceneDelegate delivers; it does not orchestrate
A neat idea: the engine emits a route linked list — the whole ancestry, not just the leaf — and you traverse it. The linked list is good, with a guardrail. Emitting an ordered path makes cold-start deep-link restoration deterministic: a link to OrderDetail resolves to [Home, Orders, OrderDetail] and the router materializes exactly that stack. Guardrail: the list must be ordered DestinationIDs plus params, derived from declared route ancestry in the manifest — never from VCs. If the engine knows ancestry by reading route data, it stays pure; if it knows by understanding the UI hierarchy, navigation structure has leaked into resolution.
The harder question is who owns the flow. Split the word "own": retain/lifetime ownership (who holds the object so it stays alive) versus logic/responsibility ownership (who implements the flow orchestration). SceneDelegate can do the first. It must not do the second. Flow logic stuffed into SceneDelegate methods is the "before" state — untestable, UIKit-bound.
| Reason it can't own the logic | Why |
|---|---|
| plural entry points | Push arrives via AppDelegate; Spotlight, widgets, Handoff via other delegates. Flow ownership must sit above any single one, or dispatch fragments into the inconsistent-vocabulary smell. |
| untestable | You can't drive flow logic embedded in a scene lifecycle callback headlessly. |
| scene multiplicity | iPad and multiwindow have multiple SceneDelegates. Some state is per-scene (nav stack), some app-global (session, manifest). |
RootRouter per scene, held by SceneDelegate but not implemented in it. SceneDelegate's job shrinks to: receive event → build RouteContext → hand to the router. The root owns the traversal loop and cross-cutting; each leaf router builds its own node. Delegates forward; the router decides.The integration fallacy
"I can test individual routes. If the parts work, the sum should too." False — and provably so for this system. "Sum of working parts works" holds only for pure, stateless, independent parts with no shared state and no ordering. A routing system is the opposite on all four axes: stateful (session), ordered (resolve → gate → resume), lifecycle-driven (VC and coordinator dealloc), concurrent (async login).
LoginCoordinator authenticates. ✓ Engine resolves cart → .gated when logged-out, .proceed when logged-in. ✓✓ Dispatcher re-dispatches on success. ✓ Every part is green. Compose them and: if login fires onFinish before committing the session, the re-dispatch reads stale context → .gated again → infinite loop. The bug lives in the ordering between parts — emergent by definition. No unit test catches it.Unit tests verify the nodes; routing bugs live in the edges. Entry-point divergence — each handler passes its own test while scheme-product crashes and universal-link-product guards ("same intent, different door" is a relationship property). Stale context across the async boundary. Leak on system-back — childDidFinish passes alone; the leak emerges from VC lifecycle × coordinator tree × swipe. Back-stack assembly — each leaf builds its node correctly, yet the composed stack can still have a duplicate Home, the wrong tab, or a push-3 animation race. Scene races — shared session, two iPad windows mutating it.
The logical hole is this: "the part works" means it satisfies its contract given assumptions about its inputs. Integration bugs are contract mismatches at the seam — A emits something B's tests never fed it. Each part passed because each test used its own idea of the interface; the mismatch is invisible until they're wired.
SceneDelegate, that test would need UIKit lifecycle. So "I want to test the composition" is itself the argument for pulling flow out of SceneDelegate into the RootRouter.Over-engineering: name the force, or delete the layer
This whole design has a lot of layers. Are we over-engineering? Over-engineering isn't layer count. It's a layer with no force behind it. The discipline is to name the force for each layer, and if you can't, delete it. The forces present at large scale: many entry points; a security requirement (gating); high route churn; many engineers; server-driven routes (a manifest); market and locale variation.
| Layer | Force it absorbs | Verdict |
|---|---|---|
| Pure engine (resolve → typed) | many entry points converging; validation; "never crash on bad input" | Earns it — baseline |
Typed RouteResolution | kills stringly dispatch; compile-time route safety | Earns it — cheap |
| Precondition policy in engine | security; one auth authority; headless-testable gates | Earns it at scale |
| Coordinators for flow | decouple VCs; deep-link to an arbitrary stack | Earns it — but read the lifecycle cost |
Separate RouteDispatcher | keep engine out of coordinator; both independently testable | Arguable — most speculative |
| Chained-gate re-resolution | preconditions compose (auth → payment → …) | Build the seam, don't over-invest |
| Two-door engine (in-app via resolve) | single policy authority across in-app and external | Defensible — looks over-built until the 2nd gate type ships |
The costs the design side tends not to admit: the coordinator lifecycle tax is a trade, not a win (a static visible problem swapped for a dynamic invisible one); indirection costs traceability (a misbehaving deep link now crosses six hops plus a closure-captured continuation instead of one if/else — you traded whole-system debuggability for per-part testability); and an onboarding tax (a new engineer must learn resolve/execute, the dispatcher, root-vs-flow before adding a route, which only pays off because routes churn constantly).
There's a popular framing that "less centralized means fewer leaks but less testable." That packages two independent properties as one. It's actually three knobs that move independently:
| Knob | How it actually behaves |
|---|---|
| Leak profile | Set by where you anchor object lifetime. Anchor leaves to the view tree and you get fewer leaks — conditional on attach/detach discipline, not on centralization. |
| Unit testability | Goes up with distribution: clean per-screen contracts are easy to test in isolation. |
| Integration testability | Goes down with distribution — but only for cross-cutting flow, and only if there's no headless root to drive it. |
So you don't have to trade these off the way the framing implies. A thin headless root recovers integration-testability; leaf lifetime on the view tree keeps the leak win. The only irreducible trade is centralize ↔ distribute — world-assembly in tests versus root accretion — not leaks versus tests.
Headless coordinators: no UIKit in the coordinator
The design goal that makes all of this testable: no coordinator imports UIKit. The coordinator depends on protocols — Navigation, View, Tab — and a factory that builds the real VCs and hands them back as those protocols. UINavigationController, UIViewController, and UITabBarController conform via retroactive extensions. The factory is the single file that imports UIKit.
BrowseFactory, CheckoutFactory…), declared from the consumer's side. DefaultFactory conforms to all of them; a mock implements just the one it needs. Same consumer-owns-the-protocol rule as the flow-specific delegate — segregate by client role, not by method.Navigation.push(_ view: View) downcasts view as? UIViewController — the cast is on the argument, not Self, so constraining Self can't remove it, and class-constraining the protocol to UIViewController would re-import UIKit and kill mockability. Isolate the cast in the one UIKit file; use assertionFailure plus a no-op, not fatalError — a nav-type mismatch shouldn't hard-crash a shipping app.The public surface — and its negative space
The signature tells you as much by what it omits as by what it declares.
// base contract every coordinator satisfies
protocol Coordinator: AnyObject {
func start()
var onFinish: (() -> Void)? { get set } // parent wires this → removes self from children[]
}
// root — exactly what the RouteDispatcher calls (no engine, no URL, no UIKit inside it)
protocol RootCoordinating: Coordinator {
func execute(_ target: Target, source: RouteSource) async // .external → reset · .internal → incremental
func satisfy(_ precondition: Precondition, completion: @escaping (Bool) -> Void)
}
// a flow coordinator — typed in-app API + deep-link materialize
final class BrowseCoordinator: Coordinator {
init(navigation: Navigation, factory: BrowseFactory, sink: @escaping (RouteRequest) -> Void)
func showProduct(id: String) // in-app typed nav — convergence point
func materialize(_ run: [Destination]) async // deep-link segment, one shot
}
The negative space is the design: no import UIKit (every dependency is a protocol; Foundation-only, headless-testable); no engine, no resolve (the dispatcher owns resolution; the coordinator takes a typed Target, never a URL or string); no concrete VC types (the factory returns View); no UINavigationControllerDelegate (pop events arrive via a callback on the injected Navigation seam, so the coordinator stays UIKit-free).
source is the regular-vs-deeplink discriminator — not a second API. Don't add parallel regular-flow methods to the root: it re-forks the unified path, risks the inconsistent-auth smell, and bloats the root with per-screen knowledge. Both in-app and deep links land on the same execute(target, source); .internal reconciles incrementally, .external resets to canonical. Typed per-screen methods live on the flow coordinators.Deep-link traversal: iterate data, don't hand off between VCs
A common mental-model bug: you picture route1 building VC1, which hands off to route2 — and so you reach for a DeepLinkTraversible protocol on each VC. Drop it. The path is data; the coordinator maps the whole list to views and sets the stack in one operation. No VC ever "traverses." They're built and placed — inert nodes.
start() stands up the root, then dispatching; reset to the manifest's canonical ancestry rather than grafting onto an unrelated stack. And don't gold-plate N-deep modal towers — a base push-run plus at most one modal is the 95% case.The reconciler dance
What if the screens already exist? The ideal is one sentence: the result should be indistinguishable from having navigated there cleanly — reuse what's correct, never duplicate, update in place on param-only changes, leave a canonical back stack. To make this testable, execute splits into a pure plan (headless) and an async apply (the only part that touches the seams).
enum Reconciler {
static func plan(from current: AppState, to target: Target, source: RouteSource) -> [NavOp] {
var ops: [NavOp] = []
if current.hasModal && !current.modalChain.matches(target.modalChain) { ops.append(.dismissModals) }
if current.selectedTab != target.tab { ops.append(.selectTab(target.tab)) }
let stack = current.stacks[target.tab] ?? []
let leaf = target.baseRun.last!
if let top = stack.last, top.id == leaf.id {
if top.params != leaf.params { ops.append(.updateVM(leaf.id, leaf.params)) } // else NO-OP
} else if let i = stack.lastIndex(where: { $0.id == leaf.id }) {
ops.append(.popTo(leaf.id))
if stack[i].params != leaf.params { ops.append(.updateVM(leaf.id, leaf.params)) }
} else if source == .external {
ops.append(.setStack(target.baseRun, tab: target.tab)) // reset to canonical
} else {
let n = stack.commonPrefixCount(with: target.baseRun)
ops.append(.setStack(stack.prefix(n).asDestinations + target.baseRun.suffix(from: n), tab: target.tab))
}
if target.hasModal && !current.modalChain.matches(target.modalChain) { ops.append(.present(target.modalChain)) }
return ops
}
}
plan is a pure function (AppState, Target, RouteSource) → [NavOp] with zero UIKit. Every branch above is one unit test — and it's the most logic-dense thing in the system, so it's exactly what you want headless. Animation is apply's job (op == plan.last), never the plan's. The one open call: whether commonPrefixCount matches by id only or id-plus-params — decide it explicitly.Testing the headless resolver — and the serialization invariant
The factory seam delivers headless flow tests: inject a MockFactory that returns SpyViews tagged with their DestinationID, drive a request, and assert the recorded navigation calls — no simulator. Flow testability comes from orchestration in a plain object plus navigation behind a fakeable seam, not from the coordinator pattern itself.
- Flow logic → headlessMock factory plus spy navigator; assert the resulting
[NavOp]or the pushed destinations. This is the bulk of the suite, and it runs without UIKit. - View state → snapshotAn orthogonal axis. Construct the screen with injected state and snapshot it — don't navigate to it. The factory swaps real VCs for mocks, so flow tests are structurally blind to rendering, which is exactly why snapshot stays.
- Fake-vs-real wiring → one thin smoke testA host-app/XCUITest on the critical path. The headless suite trusts that the spy matches the real
Navigation; something must validate that seam once.
execute is async and awaits on modal presents. A second deep link dispatched mid-await interleaves — you reconcile against a stale snapshot, or fire present while a transition is in flight (the UIKit warning you've all seen). Serialize dispatch → execute with an actor over a serial queue, or an isExecuting gate that queues the next request. Concurrent navigation is an emergent ordering bug no per-op unit test catches: it lives in the interleaving, so it needs an invariant, not coverage.Objections worth answering
A few sharp pushbacks come up every time, and they're worth answering plainly.
"Should the view model hold the coordinator, weakly?" No — not even weakly. With output closures there's no VM → coordinator reference to make weak in the first place. The VM emits a typed output; the coordinator binds to it and weakly captures itself. "Weak vs strong" was the wrong axis.
"If the user isn't logged in, shouldn't the engine throw, carrying the post-login flow?" Carry the continuation, yes; throw, no. A logged-out user is not an error — the route resolved fine, it just has a prerequisite. Gates are a success-variant carrying the resume intent; reserve throw for genuine resolution failure (malformed, unknown, validation). Re-resolve after the gate clears rather than caching a destination.
"Why a separate dispatcher instead of letting the coordinator re-resolve?" Mostly to keep the engine out of the coordinator tree, so both stay independently testable. A small app can collapse them — that's a legitimate trade, you've just moved the engine inside the coordinator.
"Why not pure per-screen routers and skip the central layer?" Because output-wiring and cross-cutting flows can't be owned at a leaf without either recoupling sibling modules or reinventing a hierarchy. Push ownership down for module isolation, keep a thin root for deep-link assembly and gating. The central layer goes thin; it doesn't vanish.
"If every route tests green, isn't the system correct?" No. Routing bugs live in the edges — ordering, shared state, lifecycle, contract mismatch at seams — not in the nodes. And testing the sum requires the orchestration to be headless, which is itself the argument against leaving flow logic in SceneDelegate.
The whole thing in one breath
Inbound URLs are untrusted, so a pure engine validates them and resolves a typed decision — including evaluating preconditions from injected context, which is what makes auth-gating headless-testable. Navigation is execution, owned by coordinators outside the engine. Entry points are thin delivery adapters; a per-scene root router consumes resolutions and orchestrates between flows, delegating within them to per-feature routers. Gates are results, not errors — they carry the intent to resume, and the dispatcher re-resolves against fresh context after the gate clears. Each layer is justified by a force that's actually present, the nodes are unit-tested and the stateful seams are integration-tested, because a working part does not imply a working whole.
That's the entire system, and it all falls out of one line held honestly: resolution is data, navigation is execution.