Continuous Median

Continuous Median

Category

Heaps

Difficulty

Hard

Problem Statement

Design a data structure that supports two operations efficiently:

  1. Insert a number from a data stream.
  2. Find the median of all numbers inserted so far.

The median is the middle value in a sorted list. If the count is even, the median is the average of the two middle values.

Intuition

To efficiently find the median, we maintain two heaps that split the data into a lower half and an upper half:

  • A max-heap for the lower half (so the largest of the smaller elements is at the top).
  • A min-heap for the upper half (so the smallest of the larger elements is at the top).

The median is always at or between the tops of the two heaps. By keeping the heaps balanced in size (differing by at most 1), we can find the median in O(1) and insert in O(log n).

Approach

  1. Initialize an empty max-heap (lowerHalf) and an empty min-heap (upperHalf).
  2. Insert(num): a. If lowerHalf is empty or num <= lowerHalf.peek(), insert into lowerHalf. b. Otherwise, insert into upperHalf. c. Rebalance: if one heap has more than 1 extra element compared to the other, move the top element from the larger heap to the smaller.
  3. GetMedian(): a. If the heaps are the same size, the median is (lowerHalf.peek() + upperHalf.peek()) / 2. b. If one heap is larger, the median is the top of the larger heap.

Pseudocode

class ContinuousMedianHandler:
    lowerHalf = MaxHeap()  // lower half of numbers
    upperHalf = MinHeap()  // upper half of numbers

    function insert(num):
        if lowerHalf is empty or num <= lowerHalf.peek():
            lowerHalf.insert(num)
        else:
            upperHalf.insert(num)
        rebalance()

    function rebalance():
        if lowerHalf.size() - upperHalf.size() > 1:
            upperHalf.insert(lowerHalf.extractMax())
        else if upperHalf.size() - lowerHalf.size() > 1:
            lowerHalf.insert(upperHalf.extractMin())

    function getMedian():
        if lowerHalf.size() == upperHalf.size():
            return (lowerHalf.peek() + upperHalf.peek()) / 2.0
        if lowerHalf.size() > upperHalf.size():
            return lowerHalf.peek()
        return upperHalf.peek()

Time & Space Complexity

  • Insert: O(log n) per insertion due to heap operations.
  • GetMedian: O(1) by peeking at the heap tops.
  • Space: O(n) to store all n elements across the two heaps.

Key Insights

  • The two-heap approach is the classic solution for streaming median problems.
  • The max-heap stores the smaller half so its maximum (the top) is the largest small value. The min-heap stores the larger half so its minimum (the top) is the smallest large value.
  • Keeping the heaps balanced (sizes differ by at most 1) ensures the median is always accessible at the tops.
  • This technique generalizes: you can find any percentile by adjusting the balance ratio.
  • For an even count, be sure to compute the average as a floating-point value.

Edge Cases

  • First element inserted: it goes into one heap; the median is that element.
  • Two elements: median is the average of both.
  • All identical elements: median is always that value.
  • Stream of sorted values: the heaps still maintain correct balance via rebalancing.
  • Stream of reverse-sorted values: same correctness guarantee.