Run-Length Encoding

Run-Length Encoding

Category

Strings

Difficulty

Easy

Problem Statement

Given a non-empty string, return its run-length encoding. Run-length encoding compresses a string by replacing consecutive identical characters with the count followed by the character. Long runs (more than 9 consecutive identical characters) should be split into multiple encoded segments, each with a maximum count of 9.

Intuition

Run-length encoding exploits the fact that many strings contain consecutive repeated characters. By replacing a run of identical characters with a count and the character itself, we achieve compression for repetitive data. Capping runs at 9 ensures each count is a single digit, making the encoding unambiguous and easy to decode.

Approach

  1. Initialize an empty result list and a runLength counter set to 1.
  2. Iterate from the second character to the end of the string:
    • If the current character equals the previous one and runLength < 9, increment runLength.
    • Otherwise, append the current runLength and the previous character to the result, then reset runLength to 1.
  3. After the loop, append the final runLength and character.
  4. Join and return the result.

Pseudocode

function runLengthEncode(string):
    result = []
    runLength = 1

    for i from 1 to length(string) - 1:
        if string[i] == string[i - 1] and runLength < 9:
            runLength = runLength + 1
        else:
            result.append(toString(runLength))
            result.append(string[i - 1])
            runLength = 1

    result.append(toString(runLength))
    result.append(string[length(string) - 1])

    return join(result, "")

Time & Space Complexity

  • Time: O(n) where n is the length of the input string. Each character is visited exactly once.
  • Space: O(n) for the output. In the worst case (no consecutive duplicates), the encoded string is twice the length of the input.

Key Insights

  • Capping run lengths at 9 keeps each count as a single digit, which simplifies both encoding and decoding.
  • The encoding is not always shorter than the original. For strings with no repeated characters, the encoded form is actually longer.
  • Do not forget to handle the last run after the loop ends.

Edge Cases

  • Single character string: returns "1" followed by that character.
  • String with all identical characters: produces one or more "9X" segments followed by a final segment for the remainder.
  • String with no consecutive duplicates (e.g., "abcdef"): each character becomes "1a1b1c1d1e1f".
  • Runs of exactly 9 characters: should produce a single "9X" segment, not split further.
  • Runs of 10+ characters: split correctly (e.g., 12 a's become "9a3a").