Avismara Hugoppalu
← All musings

Data Structures & Algorithms

Prefix sum: turning a range query into a lookup

O(n) range queries become O(1) differences. Why map[0] = 1 seeds the empty prefix, the sum−k trick, and the frequency-vs-first-index fork — the tool I kept mis-reaching for.

June 4, 2026 · 8 min read

TL;DR
A prefix sum turns "sum of an arbitrary subarray" into the difference of two precomputed boundary values — an O(n) range query collapses to O(1). The hashmap variant goes one step further: as you sweep, you ask "have I seen a prefix that pairs with where I am now to form a valid subarray?" — counting becomes a lookup. This is the pattern I most reliably mis-solved. I kept reaching for sliding window on problems that needed this, and the lookup mechanism took several passes to actually bottom out. The struggle is the post.

This series is about reaching for the right pattern instead of the heavy one — the spine is here. This entry is the one where I was honestly bad for a while, so I'm going to teach it the way I had to learn it: through the mistake, not around it.

A range query is a difference, not a sum

Start with the motivation, because the whole pattern falls out of it. You have an array and you want the sum of elements between two indices. The obvious way is a loop from the left boundary to the right boundary: O(n) per query, O(n·q) for q queries. Fine for one query. A disaster for many.

The reframe: don't store the array, store the running total up to each index. Call it the prefix sum. Once you have that, the sum of any subarray is

sum(i..j) = prefix[j] - prefix[i-1]

The sum between two points is the difference of the totals at those two points. Everything before i cancels. You precompute the prefix array once in O(n), and every range query after that is a single subtraction — O(1).

That's the load-bearing idea, and it's almost embarrassingly simple once you see it: a subarray is defined by its two boundaries, and a prefix sum lets you describe each boundary by a single number. The range becomes arithmetic on boundaries.

The leap: stop asking "what's the sum here" and start asking "have I seen the other boundary"

The basic version answers range queries when you're handed the boundaries. The interesting version is when nobody hands you boundaries — you have to find them. "How many contiguous subarrays sum to exactly k?" There's no window to slide along; the boundaries could be anywhere.

Here's the inversion that makes it work. Walk through the array maintaining the running prefix sum. At each index you're standing at one boundary — the right one. The subarray ending here sums to k exactly when some earlier prefix had the value currentPrefix - k, because:

currentPrefix - earlierPrefix == k   ⟹   the subarray between them sums to k

So instead of searching for the left boundary, you remember every prefix you've already passed in a hashmap, and at each step you look up currentPrefix - k. The count of earlier prefixes equal to that value is exactly the number of valid subarrays ending here. Subarray-counting becomes a dictionary lookup.

map[sum - k] is the entire trick. Read it as a question: "of all the left boundaries I've already walked past, how many were positioned so that the stretch from there to here sums to k?" The map turns that question into an O(1) answer.

The empty prefix, which I kept dropping on the floor

Before the worked code, the one detail that cost me the most: you have to seed the map with map[0] = 1 before the sweep starts.

This was a genuine sticking point — it looks like a magic incantation and I treated it as one for too long. It is not magic. It says: a prefix sum of 0 has occurred once, before the array begins. The empty prefix — the boundary sitting just to the left of index 0 — is a legitimate left boundary. If the running sum ever equals k outright, the subarray from the very start to here is valid, and the only way the lookup finds it is if the value 0 is already registered.

Forget map[0] = 1 and you don't crash — you silently undercount, missing exactly the subarrays that start at index 0. Silent is the dangerous part: the code runs, returns a plausible-looking number, and is wrong only on the inputs you didn't think to test. Register the empty prefix as a real boundary before the loop, every time.
func subarraysSumToK(_ nums: [Int], _ k: Int) -> Int {
    var count = 0
    var prefix = 0
    var seen: [Int: Int] = [0: 1] // the empty prefix is a legitimate left boundary
    for x in nums {
        prefix += x
        count += seen[prefix - k, default: 0] // left boundaries that close a subarray summing to k here
        seen[prefix, default: 0] += 1
    }
    return count
}

The problem I kept mis-solving

Now the honest part. For several sessions this was a confirmed weakness, and the shape of the failure was always the same: I'd see "subarray summing to a target" and my hand would go straight to sliding window. Two pointers, expand the right, shrink the left when the sum overshoots. It feels right. It is wrong, and it took me more than one pass to feel why it's wrong rather than just being told.

Sliding window has a prerequisite that nobody states out loud, and it's the thing the sliding window post bottoms out in: monotonicity. Shrinking the window from the left has to monotonically move the answer in one direction — adding an element only ever grows the sum, removing one only ever shrinks it. That's the contract that makes "overshot? shrink left" a valid move.

The moment the array has negative numbers, that contract breaks. Removing an element from the left can increase the sum. Adding one on the right can decrease it. The window's sum no longer moves predictably, so "the sum is too big, shrink left" is no longer a safe rule — shrinking might be exactly the wrong direction. Sliding window assumes a monotone landscape; with negatives the landscape isn't monotone, and the whole technique has no ground to stand on.

That is the dividing line I needed and didn't have:

Sliding windowPrefix sum + hashmap
Query shapebest/longest window under a constraintexact target / arbitrary range
Requiresmonotonic shrink (positives only)nothing — works with negatives
Boundaryboth ends move togetherleft boundary recalled from history
Answer viawindow invariantmap[sum - k] lookup
Before
See 'subarray sums to k' → reach for sliding window
Try to shrink the window on overshoot
Negatives quietly break the shrink rule
Spend a session convinced the bug is elsewhere
After
See 'exact target / arbitrary range' → prefix sum
Recall the left boundary from a hashmap, don't shrink
Negatives are fine — no monotonicity assumed
The lookup mechanism does the search for you

The resolution lever, the one sentence I wish I'd had at the start: sliding window is for optimizing a window under a monotone constraint; prefix sum is for exact-target or arbitrary-range queries where no monotone shrink exists. Negatives are the tell. When you can't trust shrinking, you can't slide — you look up.

The fork that's easy to miss: frequency vs first index

The last beat, and the one I had to be shown rather than deriving on my own. The pattern is the same in both cases — sweep, maintain a prefix sum, consult a map keyed by prefix value. What you store as the value is not the same, and it's dictated entirely by the question.

  1. Counting → store a frequency
    "How many subarrays sum to k?" You want the number of valid left boundaries, so the map stores how many times each prefix value has occurred. map[sum - k] returns a count, and you add it. Every matching boundary is worth one.
  2. Maximizing length → store the first index
    "What is the longest subarray summing to k?" Now you don't care how many left boundaries match — you care which one is earliest, because the earliest left boundary gives the longest reach to the current right boundary. So the map stores the first index each prefix value appeared at, and you never overwrite it.

Same pattern, opposite bookkeeping. The principle worth carrying off: the data structure you store is dictated by the question, not the pattern. Counting wants a frequency; maximizing a length wants the earliest position. Reach for the wrong one and the mechanism still runs, just answering a question you weren't asked.

A clean instance of the first-index variant is the balanced-subarray problem — say you want the longest stretch with an equal count of two symbols. Map one symbol to +1 and the other to -1, take the prefix sum, and a balanced stretch is exactly a subarray that sums to zero. Two indices with the same prefix value bound a zero-sum subarray between them; storing the first index each value appeared at, and measuring the distance when you see it again, gives you the longest one.

func longestSubarraySumToK(_ nums: [Int], _ k: Int) -> Int {
    var best = 0
    var prefix = 0
    var firstSeen: [Int: Int] = [0: -1] // empty prefix sits at virtual index -1
    for (i, x) in nums.enumerated() {
        prefix += x
        if let j = firstSeen[prefix - k] { best = max(best, i - j) } // earliest matching boundary = longest reach
        if firstSeen[prefix] == nil { firstSeen[prefix] = i } // keep the FIRST index only — never overwrite
    }
    return best
}

Note the seed flips meaning with the question: counting seeds [0: 1] (the empty prefix has occurred once); maximizing seeds [0: -1] (the empty prefix lives at virtual index -1, so a subarray from the start measures its full length). Same boundary, registered in the currency the question is counting in.

The takeaway
Prefix sum is a difference of boundaries; the hashmap variant is a lookup against boundaries you've already passed. The recognition cue is the one I lacked: when you can't trust shrinking a window — exact targets, arbitrary ranges, negatives in the array — the monotonic-shrink assumption is gone and sliding window has no footing, so you remember boundaries and look them up instead. And once you're in the pattern, let the question pick your bookkeeping: count with a frequency, maximize length with a first index. I lost real time mistaking this for a windowing problem; the cheapest version of that lesson is reading the cue off the input — negatives, "exact," "any range" — before your hand reaches for two pointers.

Problems to practice

Problems to practice