Compression is essential for performance, storage, networking, and modern computing efficiency.
Choosing the right algorithm depends on your trade-off: speed vs ratio vs quality.
LZ77 Compression Algorithm
What Is LZ77?
LZ77, invented by Abraham Lempel and Jacob Ziv in 1977, is one of the most important and foundational lossless compression algorithms.
It is the ancestor of many modern compressors:
gzip (DEFLATE)
ZIP files
PNG images
LZMA (7zip)
LZW (GIF)
LZ77 works by replacing repeated text with references to earlier occurrences of the same data.
It uses the concept of a sliding window over the data stream.
Core Idea: Sliding Window + Lookahead Buffer
LZ77 views the input through two windows:
search buffer which is the previously processed data
lookahead buffer which is the data we are about to encode
The encoder attempts to match the longest substring in the lookahead buffer that also appears in the search buffer.
LZ77 represents matches using a triple:
(offset, length, nextSymbol)
offset: how far back the match begins
length: how many characters match
nextSymbol: what character follows the matched block
A Minimal Example
Given input:
A B C A B C A
1. First characters (A B C) have no prior match → output literal triples:
(0, 0, A)
(0, 0, B)
(0, 0, C)
2. Remaining input: A B C A
The longest match is A B C appearing at offset 3.
(3, 3, A)
This means:
Go back 3 characters,
Copy 3 characters,
Then append A.
Advantages of LZ77
Simple algorithm is easy to implement.
Effective for text and repetitive data.
Fast decoding (copying previously seen blocks).
Forms the basis of many industrial-strength compressors.
Limitations of LZ77
Compression ratio decreases on random data and non-repetitive files
Large window sizes require more memory.
The simple triple format is sometimes inefficient → improved by formats like LZSS, LZMA, and DEFLATE.
A Micro Implementation Example (Pseudo-code)
for each position i in input:
best_length = 0
best_offset = 0
for each j in search_buffer:
length = number of characters matching input[j...] and input[i...]
if length > best_length:
best_length = length
best_offset = i - j
if best_length > 0:
emit (best_offset, best_length, input[i + best_length])
advance i by best_length + 1
else:
emit (0, 0, input[i])
advance i by 1
This matches the original LZ77 description.
LZ78 Compression Algorithm
What Is LZ78?
LZ78 is a dictionary-based compression algorithm invented by Abraham Lempel and Jacob Ziv in 1978.
It works by building a dictionary of previously seen substrings and replacing repeated substrings with:
an index (a reference to a dictionary entry), and
a next character.
Each compressed token has the form:
(index, nextSymbol)
index is 0 if no previous substring is matched.
nextSymbol is the character following the matched substring.
Unlike LZ77, which looks backward into a sliding window, LZ78 grows an explicit dictionary of phrases.
How the Algorithm Works (Intuition)
Start with an empty dictionary.
Read the input character-by-character, building the longest substring that already exists in the dictionary.
When the next character causes the substring to not exist in the dictionary:
emit a token (index, nextSymbol),
add the new substring to the dictionary.
Repeat until the input is consumed.
A Step-by-Step Example
Input:
A B C A B B C C A
Dictionary starts empty and will grow like:
1: "A"
2: "B"
3: "C"
4: "AB"
5: "BC"
6: "CA"
Compression process:
Step
Matched Substring
Next Symbol
Output
New Dictionary Entry
1
"" (none)
A
(0, A)
1: "A"
2
""
B
(0, B)
2: "B"
3
""
C
(0, C)
3: "C"
4
"A"
B
(1, B)
4: "AB"
5
"B"
C
(2, C)
5: "BC"
6
"C"
A
(3, A)
6: "CA"
Final encoded output:
(0, A)
(0, B)
(0, C)
(1, B)
(2, C)
(3, A)
And the final dictionary consists of 6 entries.
Decoding LZ78
Decoding is straightforward:
Start with an empty dictionary.
For each (index, symbol):
If index = 0, output symbol.
Else:
Fetch the dictionary entry at index,
output that string + symbol.
Add the output string as a new dictionary entry.
Token (1, B)
→ dictionary[1] = "A"
→ output "AB"
→ add "AB" as new dictionary entry
Key Differences: LZ78 vs LZ77
LZ77:
Uses a sliding window of past characters.
Matches substrings by offset and length.
Dictionary is implicit (part of the data window).
LZ78:
Builds an explicit dictionary of substrings.
Dictionary persists and grows as the input is read.
Each token is an index + next character.
LZ77 tends to handle long repeated patterns better without huge dictionaries.
LZ78 provides simpler decoding and is foundational for LZW (used in GIF).
Performance Notes
Compresses well when the input has many repeated substrings.
Dictionary growth can be memory-intensive.
Many real-world compressors (like LZW) refined LZ78 by: