Run-Length Encoding
Run-Length Encoding
Category
Strings
Difficulty
Easy
Problem Statement
Given a non-empty string, return its run-length encoding. Run-length encoding compresses a string by replacing consecutive identical characters with the count followed by the character. Long runs (more than 9 consecutive identical characters) should be split into multiple encoded segments, each with a maximum count of 9.
Intuition
Run-length encoding exploits the fact that many strings contain consecutive repeated characters. By replacing a run of identical characters with a count and the character itself, we achieve compression for repetitive data. Capping runs at 9 ensures each count is a single digit, making the encoding unambiguous and easy to decode.
Approach
- Initialize an empty result list and a
runLengthcounter set to 1. - Iterate from the second character to the end of the string:
- If the current character equals the previous one and
runLength < 9, incrementrunLength. - Otherwise, append the current
runLengthand the previous character to the result, then resetrunLengthto 1.
- If the current character equals the previous one and
- After the loop, append the final
runLengthand character. - Join and return the result.
Pseudocode
function runLengthEncode(string):
result = []
runLength = 1
for i from 1 to length(string) - 1:
if string[i] == string[i - 1] and runLength < 9:
runLength = runLength + 1
else:
result.append(toString(runLength))
result.append(string[i - 1])
runLength = 1
result.append(toString(runLength))
result.append(string[length(string) - 1])
return join(result, "")
Time & Space Complexity
- Time: O(n) where n is the length of the input string. Each character is visited exactly once.
- Space: O(n) for the output. In the worst case (no consecutive duplicates), the encoded string is twice the length of the input.
Key Insights
- Capping run lengths at 9 keeps each count as a single digit, which simplifies both encoding and decoding.
- The encoding is not always shorter than the original. For strings with no repeated characters, the encoded form is actually longer.
- Do not forget to handle the last run after the loop ends.
Edge Cases
- Single character string: returns
"1"followed by that character. - String with all identical characters: produces one or more
"9X"segments followed by a final segment for the remainder. - String with no consecutive duplicates (e.g.,
"abcdef"): each character becomes"1a1b1c1d1e1f". - Runs of exactly 9 characters: should produce a single
"9X"segment, not split further. - Runs of 10+ characters: split correctly (e.g., 12 a's become
"9a3a").