Underscorify Substring
Underscorify Substring
Category
Strings
Difficulty
Hard
Problem Statement
Given a main string and a substring, wrap every occurrence of the substring in the main string with underscores. If two or more occurrences overlap or are adjacent, they should be wrapped together under a single pair of underscores rather than individually.
Intuition
The problem breaks into two phases. First, find all occurrences of the substring (including overlapping ones) and record their start and end indices. Second, merge overlapping or adjacent intervals. Finally, build the result string by inserting underscores at the merged interval boundaries. The merging step is key — without it, overlapping occurrences would produce nested or incorrectly placed underscores.
Approach
- Find all occurrences: Scan through the main string and find every index where the substring starts. Record intervals as
[start, start + len(substring)]. - Merge intervals: Collapse overlapping or adjacent intervals into unified intervals.
- Sort intervals by start index (they are naturally sorted from the scan).
- Merge interval
[a, b]with[c, d]ifc <= b(overlapping or adjacent).
- Build result: Traverse the main string character by character. Maintain a pointer into the merged intervals. When the current index matches an interval start, insert
_. When it matches an interval end, insert_.
Pseudocode
function underscorifySubstring(string, substring):
locations = findOccurrences(string, substring)
if locations is empty:
return string
merged = mergeIntervals(locations)
return buildString(string, merged)
function findOccurrences(string, substring):
locations = []
startIdx = 0
while startIdx < length(string):
idx = string.indexOf(substring, startIdx)
if idx == -1:
break
locations.append([idx, idx + length(substring)])
startIdx = idx + 1
return locations
function mergeIntervals(intervals):
merged = [intervals[0]]
for i from 1 to length(intervals) - 1:
current = intervals[i]
previous = last element of merged
if current[0] <= previous[1]:
previous[1] = max(previous[1], current[1])
else:
merged.append(current)
return merged
function buildString(string, merged):
result = []
intervalIdx = 0
inInterval = false
i = 0
while i < length(string):
if intervalIdx < length(merged):
if i == merged[intervalIdx][0]:
result.append("_")
inInterval = true
if i == merged[intervalIdx][1]:
result.append("_")
inInterval = false
intervalIdx++
result.append(string[i])
i++
if inInterval:
result.append("_")
return join(result)
Time & Space Complexity
- Time: O(n * m + n) where n is the length of the main string and m is the length of the substring. Finding all occurrences takes O(n * m) using naive search (or O(n + m) with KMP). Merging and building are O(n).
- Space: O(n) for storing the occurrences and the result string.
Key Insights
- Overlapping occurrences must be merged before inserting underscores. Failing to merge leads to incorrect nesting.
- The merge condition uses
<=(not<) to also merge adjacent (touching) intervals. - When building the result, we track whether we are currently "inside" an underscored region to correctly close underscores at the end of the string.
Edge Cases
- Substring not found: return the original string unchanged.
- Entire string is the substring: return
_string_. - Overlapping occurrences (e.g., "aaa" with substring "aa"): intervals
[0,2]and[1,3]merge to[0,3]. - Substring at the very end of the string: underscore must close at the string's end.
- Single-character substring: every occurrence is independent unless adjacent.