Union Find
Union Find
Category
Famous Algorithms
Difficulty
Medium
Problem Statement
Implement the Union-Find (Disjoint Set Union) data structure that supports two primary operations efficiently:
- Find(x): Determine which set element x belongs to (return the representative/root of the set).
- Union(x, y): Merge the sets containing elements x and y.
Use union by rank and path compression optimizations for near-constant amortized time per operation.
Intuition
Union-Find maintains a forest of trees, where each tree represents a set and the root of the tree is the set's representative. Initially, each element is its own set (a tree of size 1). Union merges two trees, and Find traverses up the tree to locate the root.
Without optimization, trees can become tall and degenerate into linked lists. Union by rank keeps trees balanced by always attaching the shorter tree under the taller one. Path compression flattens the tree during Find operations by making every visited node point directly to the root. Together, these yield nearly O(1) amortized time per operation.
Approach
- Initialize: Each element is its own parent with rank 0.
- Find(x): Follow parent pointers from x to the root. Apply path compression by setting each visited node's parent to the root.
- Union(x, y): a. Find the roots of x and y. b. If they are the same, do nothing (already in the same set). c. Otherwise, attach the tree with lower rank under the tree with higher rank. If ranks are equal, choose one as the new root and increment its rank.
Pseudocode
class UnionFind:
parent = array where parent[i] = i
rank = array where rank[i] = 0
function find(x):
if parent[x] != x:
parent[x] = find(parent[x]) // path compression
return parent[x]
function union(x, y):
rootX = find(x)
rootY = find(y)
if rootX == rootY:
return
// union by rank
if rank[rootX] < rank[rootY]:
parent[rootX] = rootY
else if rank[rootX] > rank[rootY]:
parent[rootY] = rootX
else:
parent[rootY] = rootX
rank[rootX] += 1
Time & Space Complexity
- Time: O(alpha(n)) amortized per Find or Union operation, where alpha is the inverse Ackermann function. For all practical purposes, this is effectively O(1).
- Space: O(n) for the parent and rank arrays.
Key Insights
- Path compression alone gives O(log n) amortized time. Combined with union by rank, it improves to O(alpha(n)).
- The inverse Ackermann function alpha(n) grows so slowly that it is at most 4 for any physically realizable input size (n < 2^65536).
- Union by rank keeps the tree height logarithmic, while path compression flattens it over time.
- An alternative to union by rank is union by size, which also achieves O(alpha(n)) with path compression.
- Union-Find is used extensively in Kruskal's MST algorithm, connected components, cycle detection in undirected graphs, and percolation problems.
System Design Application
Union-Find is the cycle detection mechanism in kruskals-algorithm. Kruskal's MST algorithm processes edges in ascending weight order and uses Union-Find to check whether adding an edge would create a cycle (i.e., both endpoints are already in the same component). With path compression and union by rank, each Union-Find operation is nearly O(1) amortized, making the edge sort the dominant cost.
Edge Cases
- Union of an element with itself: no-op (roots are the same).
- Find on an element that has never been unioned: returns itself.
- Many unions creating a single large set: path compression ensures subsequent finds are fast.
- Sequence of finds without any unions: each find returns the element itself.
- Large number of elements with sparse unions: most sets remain singletons.