Difference between revisions of "CISC220 F2021 Lab9"
(→1. File compression with Huffman coding) |
|||
Line 15: | Line 15: | ||
** Re-reads in_filename character by character and writes encoded versions to out_filename (in ASCII or binary depending on do_binary) | ** Re-reads in_filename character by character and writes encoded versions to out_filename (in ASCII or binary depending on do_binary) | ||
* ''decompress(in_filename, out_filename, do_binary)'' | * ''decompress(in_filename, out_filename, do_binary)'' | ||
+ | ** Reads the code table from in_filename | ||
+ | ** Reads in_filename character by character and writes decoded versions to out_filename | ||
The "forest of tries" discussed in lecture on 11/9 is implemented with a priority queue of TrieNode pointers (each representing the root of a subtrie), ordered on the combined frequency of each subtrie. | The "forest of tries" discussed in lecture on 11/9 is implemented with a priority queue of TrieNode pointers (each representing the root of a subtrie), ordered on the combined frequency of each subtrie. | ||
Line 21: | Line 23: | ||
is where you will be writing code. | is where you will be writing code. | ||
− | The Huffman | + | The Huffman member functions to finish in main.cpp: |
* '''[1 point]''' ''merge_two_least_frequent_subtries()'' | * '''[1 point]''' ''merge_two_least_frequent_subtries()'' |
Revision as of 15:39, 10 November 2021
Lab #9
1. File compression with Huffman coding
As discussed in class on Nov. 9, tries and Huffman coding can be used to compress files/messages by analyzing character frequencies and choosing codes accordingly.
Most of a Huffman class is provided in starter code here. It implements a TrieNode class that stores parent and child links as well as the character being encoded and its frequency in the file being compressed.
The main work of Huffman happens in the following functions:
- compress(in_filename, out_filename, do_binary)
- Reads in_filename character by character and counts occurrences (aka computes frequency) for each, storing the result by ASCII index in the char_counter vector
- Applies the Huffman trie-building algorithm presented in class 11/9 in build_optimal_trie()
- Writes code table (the decompression_map) to out_filename
- Re-reads in_filename character by character and writes encoded versions to out_filename (in ASCII or binary depending on do_binary)
- decompress(in_filename, out_filename, do_binary)
- Reads the code table from in_filename
- Reads in_filename character by character and writes decoded versions to out_filename
The "forest of tries" discussed in lecture on 11/9 is implemented with a priority queue of TrieNode pointers (each representing the root of a subtrie), ordered on the combined frequency of each subtrie.
Several major functions are unfinished. In order to separate them from the finished ones, they are located in main.cpp -- this is where you will be writing code.
The Huffman member functions to finish in main.cpp:
- [1 point] merge_two_least_frequent_subtries()
- [1 point] compute_all_codes_from_trie(TrieNode *T)
- [1 point] calculate_huffman_file_size()
2. Testing
- [1 point] short_doi.txt requires x bytes in raw form. How many bytes does it take when compressed using your completed Huffman class?
3. Submission
- Make a PDF file <Your Name>_Lab9_README.pdf with your answers to the testing questions above
- Rename your code directory <Your Last Name>_Lab9 and create a single tar/zip/rar file out of it named <Your Last Name>_Lab9.tar (or .zip or .rar, etc.).
- Submit it in Canvas by midnight at the end of Tuesday, November 16