Merge Sorted Arrays

Merge Sorted Arrays

Category

Heaps

Difficulty

Very Hard

Problem Statement

Given k sorted arrays, merge them into a single sorted array. The total number of elements across all arrays is n.

Intuition

A naive approach merges arrays pairwise, but this repeatedly scans already-merged elements. A min heap (priority queue) of size k provides a much better strategy: at any moment, the next smallest element across all k arrays must be the minimum of the k current front elements. The heap efficiently identifies this minimum.

Approach

  1. Initialize a min heap.
  2. Insert the first element of each non-empty array into the heap, along with metadata: which array it came from and its index within that array.
  3. While the heap is not empty:
    • Extract the minimum element.
    • Append it to the result.
    • If the extracted element's array has a next element, insert that next element into the heap.
  4. Return the result.

Pseudocode

function mergeSortedArrays(arrays):
    minHeap = new MinHeap()
    result = []

    // Initialize heap with the first element of each array
    for i from 0 to length(arrays)-1:
        if arrays[i] is not empty:
            minHeap.insert((arrays[i][0], i, 0))
            //               value, arrayIndex, elementIndex

    while minHeap is not empty:
        (value, arrayIdx, elemIdx) = minHeap.extractMin()
        result.append(value)

        // If there's a next element in the same array, insert it
        if elemIdx + 1 < length(arrays[arrayIdx]):
            nextVal = arrays[arrayIdx][elemIdx + 1]
            minHeap.insert((nextVal, arrayIdx, elemIdx + 1))

    return result

Time & Space Complexity

MeasureComplexityJustification
TimeO(n log k)Each of the n elements is inserted into and extracted from a heap of size at most k; each operation is O(log k)
SpaceO(k + n)The heap holds at most k elements; the result array holds n elements

Key Insights

  • The heap always has at most k elements (one per array), keeping operations efficient even when n is very large.
  • This is the standard algorithm behind external sort (merging sorted runs from disk) and is used in many database systems.
  • An alternative approach is divide and conquer: recursively merge pairs of arrays. This is also O(n log k) but uses O(n log k) space for intermediate arrays unless done in-place.
  • In languages with built-in priority queues (Python's heapq, Java's PriorityQueue), the implementation is straightforward.

Edge Cases

  • k = 0 (no arrays): return an empty array.
  • Some arrays are empty: skip them during initialization.
  • k = 1: the single array is already sorted; return it directly.
  • All arrays have one element each: reduces to sorting k elements, which the heap does in O(k log k).
  • Arrays of vastly different lengths: the algorithm handles this naturally since each array independently feeds the heap.
  • Duplicate values across arrays: handled correctly; the heap may contain equal values and extracts them in any stable order.