Base64

Base64 encodes arbitrary bytes into 64 ASCII symbols so that binary data can travel through text-only systems (emails, JSON, URLs, HTML, etc.).
Therefore, it is NOT encryption, it is representation.


What is Base64 and why do we need it?

  1. Problem: Some channels are text-only (7-bit safe). Raw bytes may be mangled. Base64 maps every 3 bytes to 4 printable characters.


  2. Alphabet: A–Z a–z 0–9 + / (URL-safe variant uses - and _ instead of + and /).


  3. Padding: Encoded output length is a multiple of 4 using = as padding (except some URL/MIME variants where padding may be omitted).

Standard alphabet (index → char)
0–25  : A–Z
26–51 : a–z
52–61 : 0–9
62    : +
63    : /


How encoding works (bit view)

  1. Group input bytes into 24-bit blocks (3×8 bits). If less than 3 bytes remain, pad with zero bits.


  2. Split 24 bits into 4 groups of 6 bits. Each 6-bit value indexes the alphabet (0–63).


  3. Pad output with = if input length mod 3 is 1 (add ==) or 2 (add =).

Example: "Man" (ASCII: 0x4D 0x61 0x6E)

0x4D     0x61     0x6E
01001101 01100001 01101110  → 24 bits

Split 6-bit groups:
010011 010110 000101 101110 → 19, 22, 5, 46

Map to alphabet:
19=T, 22=W, 5=F, 46=u  → "TWFu"
// Minimal JS example
const encode = (s) => btoa(unescape(encodeURIComponent(s))); // UTF-8 safe
const decode = (b) => decodeURIComponent(escape(atob(b)));

console.log(encode("Man"));     // "TWFu"
console.log(decode("TWFu"));    // "Man"


Variants you’ll see in the wild

  1. URL-safe Base64: replace +-, /_; padding = often omitted.

  2. MIME Base64 (RFC 2045): inserts line breaks (typically every 76 chars) for email transport.

  3. Base64url (RFC 4648 §5): exact spec for JWT, JWS, etc.; usually no padding.

Variant6263PaddingTypical uses
Standard+/= requiredMIME, general
URL-safe-_= often omittedJWT, URLs, web APIs
MIME+/= requiredEmail, attachments
// Base64url helpers (JS)
const toUrl = (b64) => b64.replace(/\+/g, "-").replace(/\//g, "_").replace(/=+$/,"");
const fromUrl = (u) => (u + "===".slice((u.length + 3) % 4)).replace(/-/g,"+").replace(/_/g,"/");


Code snippets in common languages

  1. Python (base64 standard lib)

  2. import base64
    
    s = "你好, world"
    b = s.encode("utf-8")
    e = base64.b64encode(b).decode("ascii")
    print(e)  # 5L2g5aW9LCDkuK3lm70=
    
    # decode
    raw = base64.b64decode(e)
    print(raw.decode("utf-8"))
    
  3. Node.js / Browser

  4. // Node.js
    const buf = Buffer.from("hello", "utf8");
    console.log(buf.toString("base64"));          // aGVsbG8=
    console.log(Buffer.from("aGVsbG8=", "base64").toString("utf8")); // hello
    
    // Browser (UTF-8 safe)
    const enc = (s) => btoa(unescape(encodeURIComponent(s)));
    const dec = (b) => decodeURIComponent(escape(atob(b)));
    


Common pitfalls & tips

  1. UTF-8 vs bytes: Base64 encodes bytes. If you start with text, encode to UTF-8 bytes first.

  2. Padding: If decoding fails, you may be missing = padding (esp. base64url). Re-pad to multiple of 4.

  3. Line wraps: MIME inserts newlines, some decoders require removing them (NO_NL options).

  4. Security: Base64 is not encryption. Do not treat it as confidentiality. It’s easily reversible.

  5. Size overhead: Encoded size ≈ 4/3 of input. Large payloads inflate, consider compression first.

# Quick CLI (Linux/macOS)
printf "hello" | base64           # aGVsbG8=
echo "aGVsbG8=" | base64 --decode # hello

# OpenSSL (no newlines)
printf "hello" | openssl base64 -A

Cheats

  1. Length math:

  2. Data URI:
    data:<mime>;base64,<payload>
    
    img src="..."
    

  3. Validate quickly (regex):
    ^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$



Why do we need = padding characters?

  1. Decoder needs a hint: Zero bits added during encoding are indistinguishable from real data after transmission. The number of = tells the decoder how many bytes to keep:


  2. Concrete example: 1 input byte ("M"0x4D)
Input bytes (1): 4D
Pad with 16 zero bits to reach 24 bits:
01001101 00000000 00000000

Split to 6-bit groups:
010011 010000 000000 000000
    19     16      0      0   → naive alphabet: T Q A A

Naive Base64 would be "TQAA" (ambiguous).
Correct Base64 replaces the last two chars with "=" to mark missing bytes:
"TQ=="

  1. Concrete example: 2 input bytes ("Ma"0x4D 0x61)
Input bytes (2): 4D 61
Pad with 8 zero bits:
01001101 01100001 00000000

Split to 6-bit groups:
010011 010110 000100 000000
    19     22      4      0   → naive alphabet: T W E A

Naive Base64 would be "TWEA".
Correct Base64 uses one "=" to mark one missing byte:
"TWE="


  1. Why not rely on the zero bits alone?

    After encoding, the decoder cannot know whether those zeros were padding or real data.

    The = characters are explicit metadata that makes decoding unambiguous.



  2. URL-safe and omitted padding: In Base64url (JWT, etc.) the = is often omitted for compactness. To decode, restore padding so length % 4 == 0 (append = as needed).
// Re-pad a Base64url string u back to standard Base64 for decoding
const repad = (u) => (u + "===".slice((u.length + 3) % 4))
                      .replace(/-/g, "+")
                      .replace(/_/g, "/");

// Examples:
repad("TQ")     // "TQ=="  (1 input byte)
repad("TWE")    // "TWE="  (2 input bytes)
repad("TWFu")   // "TWFu"  (3 input bytes, no padding)