CISC220 F2023 Lab5

From class_wiki
Revision as of 10:17, 28 September 2023 by Cer (talk | contribs) (1. Binary Search Tree (BST) implementation [4 points])
Jump to: navigation, search

1. Binary Search Tree (BST) implementation [4 points]

Your task for this lab is to fill in the key member functions of a templatized binary search tree class BSTree_Fast (starter code and some test files are linked here). The class is defined in bst_fast_and_slow.hh -- this is where you should make all of your additions. Specifically, you need to complete the following functions using a traditional pointer-based linked structure as detailed in Chap. 6.2 of the textbook:

  • insert(key). Put key into the BST. When a key is inserted more than once, there should not be multiple BSTNodes created, but rather the number member variable of the original BSTNode should be incremented
  • remove(key): Delete key from the BST if it exists
  • int frequency(key): How many of a given key are in the BST? This is the number variable referred to above
  • key findMin(): What is the smallest key in the BST?
  • key findMax(): What is the largest key in the BST?
  • int computeHeight(): What is the height of the BST tree?

For comparison and debugging purposes, a naive implementation called BSTree_Slow which uses an unordered STL vector of pointers to BSTNodes (the left, right, and parent pointer variables of each node are NULL and unused) is provided. This version does not attempt to keep things organized, so it is rather inefficient. Note that this NOT the same as representing a binary tree's structure with an array as in Lab #4.

main.cpp (linked above) contains driver code to demonstrate (test?) the functionality of the BSTree_X class -- you can choose whether the Fast or Slow version is running by commenting/uncommenting the appropriate class declaration. The driver works as follows: two input filenames are expected as arguments to the program. The first input file contains words which are automatically added to the tree with insert. The second file contains one or more of the following commands:

  • SIZE: Prints the result of the tree size() member function
  • HEIGHT: Prints the result of the tree size function
  • MIN: Prints the result of the tree size function
  • MAX: Prints the result of the tree size function
  • FREQUENCY arg: Prints the result of calling frequency(arg)
  • REMOVE arg: Calls remove(arg), with the special interpretation of arg == "MAX" as removing the result of findMax(), and similarly for arg == "MIN". Otherwise arg is just interpreted as a literal string

You should not have to change how any of the driver code works in main.cpp, except to choose which version of the BST you are running and possibly insert print statements for debugging. Just be aware that in Gradescope main.cpp will set the BST version to Fast.

2. BST timing comparison [1 point]

Write a driver function which creates and fills a separate BSTree_Fast and BSTree_Slow of C++ strings for each of the following files by reading from them word-by-word and calling insert() repeatedly. Note that all of these files have been "cleaned" (punctuation removed, all lower-case).

  • DOI (about 1K total words)
  • BTS (~14K words)
  • GE (~185K words)

In your README PDF, answer the following questions for each file using the information from print() or your own custom functions. Your BSTree_Fast and BSTree_Slow should give the same answers for the first three questions [1 point for all questions answered]

  • How many unique words are there?
  • How many different words total are there?
  • What are the alphabetically first and last words?

How long does it takes for BSTree_Fast vs. BSTree_Slow to carry out the following FOUR operations for each of the three input files [1 point total]? (To get precise timing under Unix, use the gettimeofday() function. There is a code snippet under the Unix, Linux and Mac section here that should help.)

  • Create the BST
  • Search for each of these words with contains():
    • love
    • death
    • tyranny

3. Submission

  • Include a PDF file <Your Name>_README.pdf which contains:
    • Answers to the questions above. Do NOT include the full listing from your print() functions. However, a call to print() for both versions of the BST should be in your driver code.
    • The timing measurements requested, in a nicely-formatted table, as well as comments on why you think the timings vary for the different (1) BST implementations, (2) operations being performed, and (3) files from which the BSTs were created.
  • Rename your code directory <Your Last Name>_Lab5 and create a single tar/zip/rar file out of it named <Your Last Name>_Project1.tar (or .zip or .rar, etc.).
  • Submit it in Canvas by midnight at the end of Tuesday, October 5