CISC220 F2023 Lab8

From class_wiki
Revision as of 22:21, 2 November 2023 by Cer (talk | contribs) (2. Programming tasks)
Jump to: navigation, search

Lab #8

1. File compression with Huffman coding

As discussed in class on Nov. 2, tries and Huffman coding can be used to compress files/messages by analyzing character frequencies and choosing bitcodes accordingly.

A nearly-complete Huffman class is defined for you in huffman.hh (code here). It uses a TrieNode class that stores parent and child links as well as the character being encoded and its frequency in the file being compressed.

The main work of Huffman happens in the following functions:

  • compress()
    • Reads input file character by character and counts occurrences (aka computes frequency) for each, storing the results by ASCII index in the char_frequency vector
    • Applies the Huffman trie-building algorithm presented in class 11/2 in build_optimal_trie(). This function is TO BE WRITTEN BY YOU
    • Creates code look-up tables (string to char decompression_map and char to string compression_map) from the optimal trie in compute_all_codes_from_trie(). This function is TO BE WRITTEN BY YOU
    • Writes decompression_map as a header to the output file
    • Re-reads input file character by character and, using compression_map, writes encoded versions to the output file body (in ASCII or binary depending on do_binary flag)
  • decompress()
    • Reads the code look-up table from header of input file to decompression_map
    • Reads rest of input file bitstream, gets character for each variable-length code read using decompression_map, and writes decoded versions to output file. The ASCII part of this function, ascii_decompress_body() is TO BE WRITTEN BY YOU

The "forest of tries" discussed in lecture is implemented in the Huffman class with an STL priority queue of TrieNode pointers (each representing the root of a subtrie), ordered on the combined frequency of each subtrie. This priority queue is called trie_pq.

2. Programming tasks

Several major functions are unfinished. In order to separate them from the finished ones, they are located in main.cpp -- this is where you will be writing code.

The Huffman member functions to finish are in main.cpp -- don't make any changes in huffman.hh:

  • [1 point] ascii_decompress_body() : This function does not depend
  • [2 points] build_optimal_trie() : This function should create a forest of one-node tries, then repeatedly combine the two smallest tries until only one is left according to the following instructions:
    • (1) remove trie in PQ with SMALLEST frequency, assign to first
    • (2) remove trie in PQ with NEXT SMALLEST frequency, assign to second
    • (3) make new trie node (new_root) and plug first and second tries in as children, set new_root's frequency to first->frequency + second->frequency, and insert in new_root in PQ
  • [2 points] compute_all_codes_from_trie() : The function should recursively traverse the binary tree (trie) rooted at T to determine the huffcode string at every leaf. If T is a leaf, make sure to execute the compression_map and decompression_map assignments

3. Submission

Once again, you may work with a (human) partner on this lab.

Submit 2 files to Gradescope: (1) your README and (2) your modified main.cpp. The README should contain your name and your partner's name, notes on any limitations or issues with your code, and your interpretation of any ambiguities in the assignment instructions. Your main.cpp should also contain your name(s)