Difference between revisions of "CISC849 S2018 HW2"

From class_wiki
Jump to: navigation, search
Line 11: Line 11:
 
Images are generally small -- in the range of ~50 x 50 to ~100 x 100.
 
Images are generally small -- in the range of ~50 x 50 to ~100 x 100.
  
There are 51 object categories with 300 total instances (e.g., 5 examples of ''apple'', 6 examples of ''pliers'', and so on).  A hierarchy of object types, shown below, indicates that there are 4 main groups: ''fruits'', ''vegetables'', ''devices'', and ''containers''.
+
There are 51 object categories with 300 total instances (e.g., 5 examples of ''apple'', 6 examples of ''pliers'', and so on).  A hierarchy of object types, shown below, indicates that there are 4 main groups or ''subtrees'': ''fruits'', ''vegetables'', ''devices'', and ''containers''.
  
 
[[Image:Rgbd dataset tree.png|800px]]
 
[[Image:Rgbd dataset tree.png|800px]]
Line 20: Line 20:
 
==Tasks==
 
==Tasks==
  
Your challenge will be to train a convolutional neural network in TensorFlow to categorize a given RGB and/or depth image into '''1 of the 4 main groups above''' -- '''NOT''' one of the 51 categories.  You may use any TensorFlow-based convnet classification architecture that you wish, starting from random or pre-trained weights -- your choice.  There are two main guidelines:
+
Your challenge will be to train a convolutional neural network in TensorFlow to categorize a given RGB and/or depth image into '''1 of the 4 subtrees above''' -- '''NOT''' one of the 51 categories.  You may use any TensorFlow-based convnet classification architecture that you wish, starting from random or pre-trained weights -- your choice.  There are two main guidelines:
  
* Instance 1 of every object category (e.g., <tt>toothpaste_1</tt>, <tt>apple_1</tt>, etc.) may NOT be used for training (weight learning) or validation (hyperparameter tuning).  Rather, it must be used for testing (aka evaluation) of your final classifier(s)
+
* Instance 1 of every object category (e.g., <tt>toothpaste_1</tt>, <tt>apple_1</tt>, etc.) may NOT be used for training (weight learning) or validation (hyperparameter tuning).  Rather, it must be used only for testing (aka evaluation) of your final classifier(s)
* Measure accuracy of classifying just the RGB images (color) vs. just the depth images (pure shape)
+
* If you only have time for one, learn how to classify the RGB images (color). But if you have the time, please try to learn how to classify the depth images (pure shape) alone and present a comparison of how the two approaches worked.
  
  

Revision as of 21:38, 13 March 2018

Due Friday, March 23, midnight

Description

NOTE: YOU MAY WORK ALONE OR IN TEAMS OF TWO

Continuing the theme from HW #1, this assignment is a classification challenge. You will use the UW RGB-D Object Dataset, which was introduced in this ICRA 2011 paper. Example images are shown below:

Rgbd dataset gallery.png

Images are generally small -- in the range of ~50 x 50 to ~100 x 100.

There are 51 object categories with 300 total instances (e.g., 5 examples of apple, 6 examples of pliers, and so on). A hierarchy of object types, shown below, indicates that there are 4 main groups or subtrees: fruits, vegetables, devices, and containers.

Rgbd dataset tree.png

Each object instance was photographed on a turntable with an RGB-D camera; hence there are multiple views of each instance both as color and depth images. The naming scheme for the files generated is given here, but you will focus on the color and/or depth images. For example, toothpaste_2_1_190_crop.png is the 190th RGB frame of the 1st video sequence of the 2nd instance of a toothpaste object, and toothpaste_2_1_190_depthcrop.png is the corresponding depth image of the same toothpaste instance from the same angle.

Tasks

Your challenge will be to train a convolutional neural network in TensorFlow to categorize a given RGB and/or depth image into 1 of the 4 subtrees above -- NOT one of the 51 categories. You may use any TensorFlow-based convnet classification architecture that you wish, starting from random or pre-trained weights -- your choice. There are two main guidelines:

  • Instance 1 of every object category (e.g., toothpaste_1, apple_1, etc.) may NOT be used for training (weight learning) or validation (hyperparameter tuning). Rather, it must be used only for testing (aka evaluation) of your final classifier(s)
  • If you only have time for one, learn how to classify the RGB images (color). But if you have the time, please try to learn how to classify the depth images (pure shape) alone and present a comparison of how the two approaches worked.



You might want to try voxelization before any other processing to reduce the size of the data if your code is running slowly.

Please submit a 2-page WRITE-UP of your approach and results. What network did you use, what modifications (if any) did you make to it, how did you conduct training, how did you augment or alter the training dataset, and what accuracies were you able to get for the how you approached each task, and any issues you encountered or interesting observations you made.