How one SDK blew our launch budget — and how we got it back

An analytics SDK quietly multiplied our pre-main time. Splitting dylib cost from initializer cost with DYLD_PRINT_STATISTICS, deferring init past first-interactive, and a CI gate so it can't regress again.

TL;DR

A third-party analytics SDK blew our pre-main launch budget. The diagnosis hinged on splitting dylib-load cost from initializer cost with DYLD_PRINT_STATISTICS — the damage was almost all deferrable initializer work. The fix: defer the SDK's eager setup past first-interactive, plus a CI budget gate so a dependency can't quietly regress launch again.

We added a third-party analytics SDK and the app got slower to launch. Not a little — the pre-main time, the part that runs before your main() even gets called, jumped well past its budget. Users don't file bugs about this. They just feel the app is sluggish to open, and they open it less.

The instinct is to blame the SDK and move on. But "the SDK is slow" isn't a diagnosis you can act on. You have to know which slow.

Dylib cost vs initializer cost

Pre-main time is two very different things wearing one number:

Dylib loading — dyld mapping the dynamic libraries the SDK drags in, rebasing, binding, resolving symbols. This is mostly fixed cost. Short of static linking, you don't get it back.
Initializer time — work that runs at load, before main(): Objective-C +load methods, C++ static constructors, and anything an eager singleton does to set itself up. This is deferrable, and it's usually where the damage is.

The distinction matters because the two columns respond to completely different fixes. You attack dylib loading by removing or merging dynamic libraries — static linking, mergeable libraries, fewer frameworks in the load chain. You attack initializer time by deferring or deleting the code that runs at load. Reaching for one lever when the regression lives in the other column is how teams burn a week.

DYLD_PRINT_STATISTICS splits them for you. Set it as a launch environment variable and the console prints the pre-main breakdown the next time the app cold-starts.

DYLD_PRINT_STATISTICS only ever sees the pre-main window. A singleton that fires in didFinishLaunching instead of at +load is invisible to it. For the post-main window — launch through first frame and first interactive — you want os_signpost intervals, the App Launch template in Instruments, or MXAppLaunchMetric from MetricKit in the field. Instrument both windows or you'll fix one and ship the other.

Reading the breakdown

When I ran it against the regressed build next to the previous release, the split was lopsided. Here is the shape of what came back — qualitative, because the exact milliseconds are an old device's and not the point:

pre-main column	baseline	regressed	what it is
dylib loading	small	moderately larger	`dyld` mapping + binding the new frameworks
rebase / binding	small	roughly flat	fixups `dyld` can't skip
ObjC setup	small	roughly flat	class registration, selector uniquing
initializer time	small	dominant	`+load`, static ctors, eager singleton init
total pre-main	under budget	more than quadrupled	sum of the above

Two things had moved. Dylib loading grew moderately because the analytics SDK pulled in a chain of transitive dynamic dependencies on top of its own framework — several new dylibs in the load path, and dyld has to map every one. That cost is real but bounded, and short of static linking I couldn't claw it back.

The real damage was in the initializer column, which went from a rounding error to the dominant term. The SDK's framework was opening a websocket and kicking off an event-batch flush inside its eager singleton initializer — and that initializer ran at class load, synchronously, before main(). Network setup, on the launch critical path, charged against a window where the user is staring at a launch screen.

That reframes the problem entirely. You're no longer fighting "the SDK is heavy." You're moving deferrable work to a moment when the user won't feel it.

This is the trap that catches most SDK integrations, and it's invisible in code review. Nobody reads a CocoaPods transitive dependency tree, and +load / static constructors run with no call site in your code to grep for. The only reliable signal is measurement. The bulk of this regression was deferrable initializer time hiding behind a singleton's init() — but you'd never know that from the diff.

Defer past first-interactive

The SDK didn't need to be live before the user could see and touch the screen. The analytics it provided were session analytics — nothing that needed to be running before the first frame. So I wrapped its initialization in a lazy, thread-safe getter and triggered it after first-interactive — once the first screen is up and responsive — instead of at load.

enum Analytics {
    static let shared: AnalyticsSDK = {
        AnalyticsSDK(configuration: .default)   // runs on first access, not at +load
    }()
}

// Kicked off after the first screen is interactive, off the launch path:
func sceneDidBecomeActive() {
    Task.detached(priority: .utility) { _ = Analytics.shared }
}

There's a subtlety worth being honest about. A Swift static let is already lazy and thread-safe — so what did the wrapper actually buy? Only this: it deferred the work my code was triggering, the explicit early initialization call. It does not move the framework's own +load methods or C++ static constructors — dyld runs those before main() no matter what my code does. So deferring the call collapsed the part I controlled (the eager websocket + batch-flush setup), but the dylib-load floor stayed exactly where it was. Lazy init can't reduce the first column. It can collapse the second.

Nothing about the SDK's behaviour changed from the user's perspective — events still fire, the session still tracks. It just stopped charging its setup cost to the one window where the user is watching a launch screen. The dylib loading line held steady; initializer time dropped back to a rounding error; pre-main came back under budget.

Before

Singleton initializes eagerly at +load

Websocket + batch flush on the pre-main critical path

Initializer time dominates pre-main

Pre-main more than quadrupled, well over budget

After

Initialization deferred past first-interactive

Same dylibs still load — that floor is dyld's, not ours

Initializer time back to a rounding error

Pre-main back under budget

The objection here is fair: lazy init just moves the websocket and batch-flush cost to first use — doesn't the user pay it then? Yes, but off the blank-launch critical path, onto a moment after first interactive when the app is already alive and responsive. And it pre-warms in the background during idle after the first frame, so it's usually warm before analytics is actually needed. The target is time-to-first-interactive, not total work done.

The phased reasoning

I didn't land on the deferred wrapper first. The order matters, because each phase ruled out a class of fix.

Phase 1 — Measure, don't guess
Run DYLD_PRINT_STATISTICS on a fixed reference device against both the regressed build and the previous release. Get the delta per column, not a single wall-clock number. This is what told me the regression was overwhelmingly initializer time, not dylib loading — which immediately killed half the candidate fixes.
Phase 2 — Try to remove the dylib cost
The vendor had a static variant as an XCFramework. Static linking would have removed the new dylibs from the load chain entirely — the right fix for the dylib column. But that build shipped with a known batching bug on an older iOS version they hadn't patched. Not usable yet. Parked it.
Phase 3 — Defer the initializer cost we own
Since the dominant cost was the SDK's own eager initialization, and none of it was needed before first-interactive, wrap it in a lazy getter and trigger it after the first screen is live. This collapsed the column I could actually control without waiting on the vendor.
Phase 4 — Migrate to static once it's safe
File the vendor bug with full reproduction and the DYLD breakdown, asking them to fix the batching bug and make static the default distribution. When the patched static build landed, migrate to it in a follow-up — removing the dylibs entirely dropped pre-main below the original baseline. Kept the lazy wrapper anyway as a belt-and-braces safety net.

The fix that matters is the one that holds

Shaving the milliseconds back was the easy part. The real question is why a dependency was allowed to blow the launch budget in the first place — and the answer is that nobody was measuring it at the point a dependency gets added.

So the durable fix was a CI gate: any new dependency has to come with a DYLD_PRINT_STATISTICS delta against the previous release, on a pinned reference device, and there's a hard ceiling on how much pre-main any single SDK is allowed to contribute. A dependency that blows the budget fails the check before it merges. We tightened that ceiling once we'd cleaned house, because the cheapest cold-start regression is the one that never lands.

The gate only works if it's honest, and a launch-time gate is easy to make flaky:

Pin the device and OS. One fixed reference device, same OS, every run. Wall-clock launch timing varies wildly across hardware; DYLD_PRINT_STATISTICS is a deterministic pre-main number and far less noisy.
Control the cold launch. Kill and relaunch, discard the first run, take the median of several. A warm launch measures nothing useful.
Gate on the delta, not the absolute. Attribute the change to the SDK's contribution versus the previous release. An absolute threshold drifts with OS and toolchain; a delta catches what this change did.

On device choice: pick a low-end reference device that represents the slow tail, not a flagship. Cold start is a low-end-device problem — that's where the budget actually bites — and a fast machine or an M-series simulator hides exactly the regressions you're trying to catch. One well-chosen slow device plus a simulator delta surfaces more than a rack of fast ones.

The takeaway

Pre-main regressions from third-party SDKs don't show up in code review — they show up in DYLD_PRINT_STATISTICS. Split the number before you fix anything: dylib loading you only reduce by removing or merging dynamic libraries; initializer time you defer or delete. The CI gate is what saves you, because without it a quadrupled pre-main ships, and the client blames their own code. Measurement is not optional at SDK scale.

A few things worth keeping:

Measure on a fixed, slower reference device, not your M-series laptop's simulator. Cold start is a low-end-device problem; a fast machine hides it.
A dependency's cost is not just its binary size. What it does at +load is often the bigger tax, and it's invisible until you split the pre-main number into dylib loading versus initializer time.
"Defer past first-interactive" beats "optimize the work" for anything that isn't needed to render the first screen. The fastest setup code is the code that runs after the user is already looking at something.