CISC220 F2021 Lab9
Lab #9
1. File compression with Huffman coding
As discussed in class on Nov. 9, tries and Huffman coding can be used to compress files/messages by analyzing character frequencies and choosing codes accordingly.
Most of a Huffman class is provided in starter code here. It implements a TrieNode class that stores parent and child links as well as the character being encoded and its freqency. The "forest of tries" discussed in lecture on 11/9 is implemented with a priority queue of TrieNode pointers (each representing the root of a subtrie), ordered on the combined frequency of each subtrie.
Several major functions are unfinished. In order to separate them from the finished ones, they are located in main.cpp -- this is where you will be writing code.
The Huffman memberfunctions to finish in main.cpp:
- [1 point] merge_two_least_frequent_subtries()
- [1 point] compute_all_codes_from_trie(TrieNode *T)
- [1 point] calculate_huffman_file_size()
2. Testing
- [1 point] short_doi.txt requires x bytes in raw form. How many bytes does it take when compressed using your completed Huffman class?
3. Submission
- Make a PDF file <Your Name>_Lab9_README.pdf with your answers to the testing questions above
- Rename your code directory <Your Last Name>_Lab9 and create a single tar/zip/rar file out of it named <Your Last Name>_Lab9.tar (or .zip or .rar, etc.).
- Submit it in Canvas by midnight at the end of Tuesday, November 16