DYLD_PRINT_STATISTICS — the damage was almost all deferrable initializer work. The fix: defer the SDK's eager setup past first-interactive, plus a CI budget gate so a dependency can't quietly regress launch again.We added a third-party analytics SDK and the app got slower to launch. Not a little — the pre-main time, the part that runs before your main() even gets called, jumped well past its budget. Users don't file bugs about this. They just feel the app is sluggish to open, and they open it less.
The instinct is to blame the SDK and move on. But "the SDK is slow" isn't a diagnosis you can act on. You have to know which slow.
Dylib cost vs initializer cost
Pre-main time is two very different things wearing one number:
- Dylib loading —
dyldmapping the dynamic libraries the SDK drags in, rebasing, binding, resolving symbols. This is mostly fixed cost. Short of static linking, you don't get it back. - Initializer time — work that runs at load, before
main(): Objective-C+loadmethods, C++ static constructors, and anything an eager singleton does to set itself up. This is deferrable, and it's usually where the damage is.
The distinction matters because the two columns respond to completely different fixes. You attack dylib loading by removing or merging dynamic libraries — static linking, mergeable libraries, fewer frameworks in the load chain. You attack initializer time by deferring or deleting the code that runs at load. Reaching for one lever when the regression lives in the other column is how teams burn a week.
DYLD_PRINT_STATISTICS splits them for you. Set it as a launch environment variable and the console prints the pre-main breakdown the next time the app cold-starts.
DYLD_PRINT_STATISTICS only ever sees the pre-main window. A singleton that fires in didFinishLaunching instead of at +load is invisible to it. For the post-main window — launch through first frame and first interactive — you want os_signpost intervals, the App Launch template in Instruments, or MXAppLaunchMetric from MetricKit in the field. Instrument both windows or you'll fix one and ship the other.
Reading the breakdown
When I ran it against the regressed build next to the previous release, the split was lopsided. Here is the shape of what came back — qualitative, because the exact milliseconds are an old device's and not the point:
| pre-main column | baseline | regressed | what it is |
|---|---|---|---|
| dylib loading | small | moderately larger | dyld mapping + binding the new frameworks |
| rebase / binding | small | roughly flat | fixups dyld can't skip |
| ObjC setup | small | roughly flat | class registration, selector uniquing |
| initializer time | small | dominant | +load, static ctors, eager singleton init |
| total pre-main | under budget | more than quadrupled | sum of the above |
Two things had moved. Dylib loading grew moderately because the analytics SDK pulled in a chain of transitive dynamic dependencies on top of its own framework — several new dylibs in the load path, and dyld has to map every one. That cost is real but bounded, and short of static linking I couldn't claw it back.
The real damage was in the initializer column, which went from a rounding error to the dominant term. The SDK's framework was opening a websocket and kicking off an event-batch flush inside its eager singleton initializer — and that initializer ran at class load, synchronously, before main(). Network setup, on the launch critical path, charged against a window where the user is staring at a launch screen.
That reframes the problem entirely. You're no longer fighting "the SDK is heavy." You're moving deferrable work to a moment when the user won't feel it.
This is the trap that catches most SDK integrations, and it's invisible in code review. Nobody reads a CocoaPods transitive dependency tree, and +load / static constructors run with no call site in your code to grep for. The only reliable signal is measurement. The bulk of this regression was deferrable initializer time hiding behind a singleton's init() — but you'd never know that from the diff.
Defer past first-interactive
The SDK didn't need to be live before the user could see and touch the screen. The analytics it provided were session analytics — nothing that needed to be running before the first frame. So I wrapped its initialization in a lazy, thread-safe getter and triggered it after first-interactive — once the first screen is up and responsive — instead of at load.
enum Analytics {
static let shared: AnalyticsSDK = {
AnalyticsSDK(configuration: .default) // runs on first access, not at +load
}()
}
// Kicked off after the first screen is interactive, off the launch path:
func sceneDidBecomeActive() {
Task.detached(priority: .utility) { _ = Analytics.shared }
}
There's a subtlety worth being honest about. A Swift static let is already lazy and thread-safe — so what did the wrapper actually buy? Only this: it deferred the work my code was triggering, the explicit early initialization call. It does not move the framework's own +load methods or C++ static constructors — dyld runs those before main() no matter what my code does. So deferring the call collapsed the part I controlled (the eager websocket + batch-flush setup), but the dylib-load floor stayed exactly where it was. Lazy init can't reduce the first column. It can collapse the second.
Nothing about the SDK's behaviour changed from the user's perspective — events still fire, the session still tracks. It just stopped charging its setup cost to the one window where the user is watching a launch screen. The dylib loading line held steady; initializer time dropped back to a rounding error; pre-main came back under budget.
- Singleton initializes eagerly at +load
- Websocket + batch flush on the pre-main critical path
- Initializer time dominates pre-main
- Pre-main more than quadrupled, well over budget
- Initialization deferred past first-interactive
- Same dylibs still load — that floor is dyld's, not ours
- Initializer time back to a rounding error
- Pre-main back under budget
The objection here is fair: lazy init just moves the websocket and batch-flush cost to first use — doesn't the user pay it then? Yes, but off the blank-launch critical path, onto a moment after first interactive when the app is already alive and responsive. And it pre-warms in the background during idle after the first frame, so it's usually warm before analytics is actually needed. The target is time-to-first-interactive, not total work done.
The phased reasoning
I didn't land on the deferred wrapper first. The order matters, because each phase ruled out a class of fix.
- Phase 1 — Measure, don't guessRun
DYLD_PRINT_STATISTICSon a fixed reference device against both the regressed build and the previous release. Get the delta per column, not a single wall-clock number. This is what told me the regression was overwhelmingly initializer time, not dylib loading — which immediately killed half the candidate fixes. - Phase 2 — Try to remove the dylib costThe vendor had a static variant as an XCFramework. Static linking would have removed the new dylibs from the load chain entirely — the right fix for the dylib column. But that build shipped with a known batching bug on an older iOS version they hadn't patched. Not usable yet. Parked it.
- Phase 3 — Defer the initializer cost we ownSince the dominant cost was the SDK's own eager initialization, and none of it was needed before first-interactive, wrap it in a lazy getter and trigger it after the first screen is live. This collapsed the column I could actually control without waiting on the vendor.
- Phase 4 — Migrate to static once it's safeFile the vendor bug with full reproduction and the DYLD breakdown, asking them to fix the batching bug and make static the default distribution. When the patched static build landed, migrate to it in a follow-up — removing the dylibs entirely dropped pre-main below the original baseline. Kept the lazy wrapper anyway as a belt-and-braces safety net.
The fix that matters is the one that holds
Shaving the milliseconds back was the easy part. The real question is why a dependency was allowed to blow the launch budget in the first place — and the answer is that nobody was measuring it at the point a dependency gets added.
So the durable fix was a CI gate: any new dependency has to come with a DYLD_PRINT_STATISTICS delta against the previous release, on a pinned reference device, and there's a hard ceiling on how much pre-main any single SDK is allowed to contribute. A dependency that blows the budget fails the check before it merges. We tightened that ceiling once we'd cleaned house, because the cheapest cold-start regression is the one that never lands.
The gate only works if it's honest, and a launch-time gate is easy to make flaky:
- Pin the device and OS. One fixed reference device, same OS, every run. Wall-clock launch timing varies wildly across hardware;
DYLD_PRINT_STATISTICSis a deterministic pre-main number and far less noisy. - Control the cold launch. Kill and relaunch, discard the first run, take the median of several. A warm launch measures nothing useful.
- Gate on the delta, not the absolute. Attribute the change to the SDK's contribution versus the previous release. An absolute threshold drifts with OS and toolchain; a delta catches what this change did.
On device choice: pick a low-end reference device that represents the slow tail, not a flagship. Cold start is a low-end-device problem — that's where the budget actually bites — and a fast machine or an M-series simulator hides exactly the regressions you're trying to catch. One well-chosen slow device plus a simulator delta surfaces more than a rack of fast ones.
Pre-main regressions from third-party SDKs don't show up in code review — they show up in DYLD_PRINT_STATISTICS. Split the number before you fix anything: dylib loading you only reduce by removing or merging dynamic libraries; initializer time you defer or delete. The CI gate is what saves you, because without it a quadrupled pre-main ships, and the client blames their own code. Measurement is not optional at SDK scale.
A few things worth keeping:
- Measure on a fixed, slower reference device, not your M-series laptop's simulator. Cold start is a low-end-device problem; a fast machine hides it.
- A dependency's cost is not just its binary size. What it does at
+loadis often the bigger tax, and it's invisible until you split the pre-main number into dylib loading versus initializer time. - "Defer past first-interactive" beats "optimize the work" for anything that isn't needed to render the first screen. The fastest setup code is the code that runs after the user is already looking at something.