Home
DSA Course
Greedy Algorithms
Huffman Coding
Huffman Coding
Huffman coding is a popular algorithm used for lossless data compression. It assigns variable-length codes to input characters, with lengths based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.
Huffman Coding
Ready
See the Logic in Motion
Stop memorizing code. Unlock the full interactive visualizer to master the logic step-by-step.Intuition
In standard ASCII, every character uses 8 bits. But in many texts, some characters (like 'e', 'a') appear much more often than others (like 'z', 'q').
Idea: What if we use fewer bits for frequent characters and more bits for rare ones? This would reduce the total size. To ensure we can decode it uniquely, we need prefix codes (no code is a prefix of another). Huffman Coding builds an optimal prefix code tree to achieve this.
Concept
Greedy Strategy: Build the tree from the bottom up. Start with a forest of leaf nodes (one for each character).
Repeatedly extract the two nodes with the lowest frequency, merge them into a new internal node (frequency = sum of children), and insert it back. This ensures rare characters end up deep in the tree (long codes) and frequent ones near the root (short codes).
How it Works
- Count Frequencies: Calculate the frequency of each character in the string.
- Create Leaf Nodes: Create a leaf node for each character and add them to a priority queue (min-heap).
- Build Tree: While the queue has more than one node:
- Extract two nodes with the minimum frequency.
- Create a new internal node with these two as children and frequency equal to the sum of their frequencies.
- Add the new node to the queue.
- Generate Codes: The remaining node is the root. Traverse the tree (Left=0, Right=1) to assign codes.
Step-by-Step Breakdown
String: "AABBBCCCC"
- Frequencies: A:2, B:3, C:4
- Priority Queue: [A(2), B(3), C(4)]
- Extract Min: A(2) and B(3).
- Merge: New Node(5) with children A and B.
- Queue: [C(4), Node(5)]
- Extract Min: C(4) and Node(5).
- Merge: New Root(9) with children C and Node(5).
- Codes: Path to C is 0 (or 1), Path to A/B is longer.
When to Use
- File compression (ZIP, GZIP).
- Multimedia codecs (JPEG, MP3).
- Transmission where bandwidth is limited.
When NOT to Use
- If all symbols have equal probability (fixed-length is optimal).
- If you need random access to the compressed data (variable length makes this hard).
How to Identify
Keywords: "Lossless compression", "Optimal prefix code", "Variable length coding".
Stop Guessing, Start Mastering.
Build the FAANG intuition. Master this pattern with optimized implementations, visual dry runs, and our curated collection of high-yield problems.
