Introduction to Compression Algorithms

  1. What Is Compression?



  2. Lossless Compression

  3. Example formats:
    ZIP, GZIP, PNG, FLAC, Zstandard, Brotli
    


  4. Lossy Compression

  5. Example formats:
    JPEG, MP3, AAC, WebM, H.264 / H.265, AV1
    


  6. Dictionary-Based Algorithms

  7. Example:
    "abcabcabc" →
    (abc)(3 repeats) →
    much smaller representation
    


  8. Entropy Coding

  9. Example:
    Frequent 'A' → 0
    Less frequent 'Z' → 11011
    


  10. Burrows–Wheeler Transform (BWT)



  11. Modern High-Performance Algorithms

  12. Speed vs Ratio:
    LZ4 → fastest
    Zstandard → balanced
    Brotli → best ratio for web
    


  13. Applications of Compression



  14. Summary of Compression Types

  15. Category Description Examples
    Lossless Preserves exact data gzip, PNG, ZIP
    Lossy Removes perceptually unnecessary data JPEG, MP3, H.264
    Dictionary-based Reuses repeated fragments LZ77, LZW
    Entropy coding Shorter codes for frequent symbols Huffman, Arithmetic
    Modern fast Optimized for speed + ratio Zstd, LZ4, Brotli




LZ77 Compression Algorithm

  1. What Is LZ77?


  2. Core Idea: Sliding Window + Lookahead Buffer
  3. (offset, length, nextSymbol)
    


  4. A Minimal Example

  5. A B C A B C A
    
    (0, 0, A)
    (0, 0, B)
    (0, 0, C)
    
    (3, 3, A)
    


  6. Advantages of LZ77



  7. Limitations of LZ77



  8. A Micro Implementation Example (Pseudo-code)
  9. for each position i in input:
        best_length = 0
        best_offset = 0
    
        for each j in search_buffer:
            length = number of characters matching input[j...] and input[i...]
    
            if length > best_length:
                best_length = length
                best_offset = i - j
    
        if best_length > 0:
            emit (best_offset, best_length, input[i + best_length])
            advance i by best_length + 1
        else:
            emit (0, 0, input[i])
            advance i by 1
    



LZ78 Compression Algorithm

  1. What Is LZ78?


  2. How the Algorithm Works (Intuition)



  3. A Step-by-Step Example

  4. A B C A B B C C A
    
    1: "A"
    2: "B"
    3: "C"
    4: "AB"
    5: "BC"
    6: "CA"
    


    StepMatched SubstringNext SymbolOutputNew Dictionary Entry
    1 "" (none) A (0, A) 1: "A"
    2 "" B (0, B) 2: "B"
    3 "" C (0, C) 3: "C"
    4 "A" B (1, B) 4: "AB"
    5 "B" C (2, C) 5: "BC"
    6 "C" A (3, A) 6: "CA"

    (0, A)
    (0, B)
    (0, C)
    (1, B)
    (2, C)
    (3, A)
    


  5. Decoding LZ78

  6. Token (1, B)
    → dictionary[1] = "A"
    → output "AB"
    → add "AB" as new dictionary entry
    


  7. Key Differences: LZ78 vs LZ77



  8. Performance Notes