Pattern Matcher

Pattern Matcher

Category

Strings

Difficulty

Hard

Problem Statement

Given a pattern string consisting of only x and y characters, and a main string, determine if the main string matches the pattern. Each x in the pattern represents one fixed substring and each y represents another fixed substring (different from x). Every x must map to the same substring, and every y must map to the same (but different) substring. Return the two substrings [x, y] if a match exists, or an empty array otherwise.

Intuition

The pattern constrains the structure of the main string. If the pattern has countX x's and countY y's, and x maps to a string of length lenX and y maps to a string of length lenY, then: countX * lenX + countY * lenY = len(mainString). We can iterate over all possible values of lenX, compute lenY from the equation, and verify if the candidate strings produce the main string when substituted into the pattern.

Approach

  1. Ensure the pattern starts with x. If it starts with y, swap all xs and ys in the pattern (and remember to swap back in the result).
  2. Count occurrences of x (countX) and y (countY).
  3. For each possible lenX from 1 to len(mainString) / countX: a. Compute remaining characters: remaining = len(mainString) - countX * lenX. b. If countY == 0, check if remaining == 0 and the pattern matches. c. Otherwise, compute lenY = remaining / countY. If not an integer, skip. d. Extract the candidate x substring from the first occurrence of x in the pattern. e. Extract the candidate y substring from the first occurrence of y in the pattern. f. Build the string from the pattern using these candidates and compare with the main string.
  4. Return the matching pair or an empty array.

Pseudocode

function patternMatcher(pattern, string):
    if length(pattern) > length(string):
        return []

    swapped = false
    if pattern[0] == 'y':
        swap all x and y in pattern
        swapped = true

    countX = count of 'x' in pattern
    countY = count of 'y' in pattern
    firstYIndex = index of first 'y' in pattern (or -1)
    n = length(string)

    for lenX from 1 to n / countX:
        remaining = n - countX * lenX
        if countY == 0:
            if remaining == 0:
                xStr = string[0 : lenX]
                if buildFromPattern(pattern, xStr, "") == string:
                    return [xStr, ""] if not swapped else ["", xStr]
            continue

        if remaining % countY != 0:
            continue
        lenY = remaining / countY

        yStartIdx = firstYIndex * lenX  // approximate, need actual position
        // Calculate actual start of first y occurrence
        yStart = 0
        for c in pattern[0..firstYIndex-1]:
            yStart += lenX if c == 'x' else lenY

        xStr = string[0 : lenX]
        yStr = string[yStart : yStart + lenY]

        if buildFromPattern(pattern, xStr, yStr) == string:
            if swapped:
                return [yStr, xStr]
            return [xStr, yStr]

    return []

Time & Space Complexity

  • Time: O(n^2 + n*m) where n is the length of the main string and m is the length of the pattern. We iterate O(n) possible lengths for x, and each verification step takes O(n).
  • Space: O(n) for building the candidate string during verification.

Key Insights

  • Normalizing the pattern to always start with x reduces the number of cases to handle.
  • The equation countX * lenX + countY * lenY = n constrains the search space, making it feasible to enumerate all possibilities.
  • If countY == 0, the entire string must be composed solely of repeated x substrings.
  • Extracting candidates from their first occurrence positions in the main string avoids generating all possible substrings.

Edge Cases

  • Pattern with only xs: the main string must be composed of identical repeated substrings.
  • Pattern with only ys: similar to above after the swap normalization.
  • Single character pattern: x maps to the entire string.
  • x and y map to the same string: technically invalid per problem constraints (they must be different).
  • Empty main string with non-empty pattern: no valid match.
  • Pattern longer than main string: no valid match possible.