Underscorify Substring

Underscorify Substring

Category

Strings

Difficulty

Hard

Problem Statement

Given a main string and a substring, wrap every occurrence of the substring in the main string with underscores. If two or more occurrences overlap or are adjacent, they should be wrapped together under a single pair of underscores rather than individually.

Intuition

The problem breaks into two phases. First, find all occurrences of the substring (including overlapping ones) and record their start and end indices. Second, merge overlapping or adjacent intervals. Finally, build the result string by inserting underscores at the merged interval boundaries. The merging step is key — without it, overlapping occurrences would produce nested or incorrectly placed underscores.

Approach

  1. Find all occurrences: Scan through the main string and find every index where the substring starts. Record intervals as [start, start + len(substring)].
  2. Merge intervals: Collapse overlapping or adjacent intervals into unified intervals.
    • Sort intervals by start index (they are naturally sorted from the scan).
    • Merge interval [a, b] with [c, d] if c <= b (overlapping or adjacent).
  3. Build result: Traverse the main string character by character. Maintain a pointer into the merged intervals. When the current index matches an interval start, insert _. When it matches an interval end, insert _.

Pseudocode

function underscorifySubstring(string, substring):
    locations = findOccurrences(string, substring)
    if locations is empty:
        return string

    merged = mergeIntervals(locations)
    return buildString(string, merged)

function findOccurrences(string, substring):
    locations = []
    startIdx = 0
    while startIdx < length(string):
        idx = string.indexOf(substring, startIdx)
        if idx == -1:
            break
        locations.append([idx, idx + length(substring)])
        startIdx = idx + 1
    return locations

function mergeIntervals(intervals):
    merged = [intervals[0]]
    for i from 1 to length(intervals) - 1:
        current = intervals[i]
        previous = last element of merged
        if current[0] <= previous[1]:
            previous[1] = max(previous[1], current[1])
        else:
            merged.append(current)
    return merged

function buildString(string, merged):
    result = []
    intervalIdx = 0
    inInterval = false
    i = 0

    while i < length(string):
        if intervalIdx < length(merged):
            if i == merged[intervalIdx][0]:
                result.append("_")
                inInterval = true
            if i == merged[intervalIdx][1]:
                result.append("_")
                inInterval = false
                intervalIdx++

        result.append(string[i])
        i++

    if inInterval:
        result.append("_")

    return join(result)

Time & Space Complexity

  • Time: O(n * m + n) where n is the length of the main string and m is the length of the substring. Finding all occurrences takes O(n * m) using naive search (or O(n + m) with KMP). Merging and building are O(n).
  • Space: O(n) for storing the occurrences and the result string.

Key Insights

  • Overlapping occurrences must be merged before inserting underscores. Failing to merge leads to incorrect nesting.
  • The merge condition uses <= (not <) to also merge adjacent (touching) intervals.
  • When building the result, we track whether we are currently "inside" an underscored region to correctly close underscores at the end of the string.

Edge Cases

  • Substring not found: return the original string unchanged.
  • Entire string is the substring: return _string_.
  • Overlapping occurrences (e.g., "aaa" with substring "aa"): intervals [0,2] and [1,3] merge to [0,3].
  • Substring at the very end of the string: underscore must close at the string's end.
  • Single-character substring: every occurrence is independent unless adjacent.