Difference between revisions of "CISC849 S2018 HW2"

From class_wiki
Jump to: navigation, search
Line 9: Line 9:
 
[[Image:Rgbd dataset gallery.png|800px]]  
 
[[Image:Rgbd dataset gallery.png|800px]]  
  
Images are generally small -- in the range of 50 x 50 to 100 x 100.
+
Images are generally small -- in the range of ~50 x 50 to ~100 x 100.
  
There are 51 object categories with 300 total instances (e.g., 5 examples of ''apple'', 6 examples of ''pliers'', and so on), and multiple views of each instance both as color and depth images.  A hierarchy of object types, shown below, indicates that there are 4 main groups: ''fruits'', ''vegetables'', ''devices'', and ''containers''.
+
There are 51 object categories with 300 total instances (e.g., 5 examples of ''apple'', 6 examples of ''pliers'', and so on).  A hierarchy of object types, shown below, indicates that there are 4 main groups: ''fruits'', ''vegetables'', ''devices'', and ''containers''.
  
 
[[Image:Rgbd dataset tree.png|800px]]
 
[[Image:Rgbd dataset tree.png|800px]]
  
The naming scheme for different objects, instances, views, etc. are given [http://rgbd-dataset.cs.washington.edu/dataset/rgbd-dataset/README.txt here].
+
Each object instance was photographed on a turntable with an RGB-D camera; hence there are multiple views of each instance both as color and depth images.
 
+
The naming scheme for the files generated is given [http://rgbd-dataset.cs.washington.edu/dataset/rgbd-dataset/README.txt here], but you will focus on the color and/or depth images.  For example, <tt>toothpaste_2_1_190_crop.png</tt> is the 190th RGB frame of the 1st video sequence of the 2nd instance of a ''toothpaste'' object, and <tt>toothpaste_2_1_190_depthcrop.png</tt> is the corresponding depth image.
  
 +
Your challenge will be to train a convolutional neural network in TensorFlow to categorize a given
  
 
These are taken from a larger [https://github.com/PointCloudLibrary/data/tree/master/segmentation/mOSD object dataset] that you are welcome to test your code on.  For grading, I will try your submissions on some of the other "learn" point clouds (but not the "test" data).
 
These are taken from a larger [https://github.com/PointCloudLibrary/data/tree/master/segmentation/mOSD object dataset] that you are welcome to test your code on.  For grading, I will try your submissions on some of the other "learn" point clouds (but not the "test" data).

Revision as of 22:25, 13 March 2018

Due Friday, March 23, midnight

Description

NOTE: YOU MAY WORK ALONE OR IN TEAMS OF TWO

Continuing the theme from HW #1, this assignment is a classification challenge. You will use the UW RGB-D Object Dataset, which was introduced in this ICRA 2011 paper. Example images are shown below:

Rgbd dataset gallery.png

Images are generally small -- in the range of ~50 x 50 to ~100 x 100.

There are 51 object categories with 300 total instances (e.g., 5 examples of apple, 6 examples of pliers, and so on). A hierarchy of object types, shown below, indicates that there are 4 main groups: fruits, vegetables, devices, and containers.

Rgbd dataset tree.png

Each object instance was photographed on a turntable with an RGB-D camera; hence there are multiple views of each instance both as color and depth images. The naming scheme for the files generated is given here, but you will focus on the color and/or depth images. For example, toothpaste_2_1_190_crop.png is the 190th RGB frame of the 1st video sequence of the 2nd instance of a toothpaste object, and toothpaste_2_1_190_depthcrop.png is the corresponding depth image.

Your challenge will be to train a convolutional neural network in TensorFlow to categorize a given

These are taken from a larger object dataset that you are welcome to test your code on. For grading, I will try your submissions on some of the other "learn" point clouds (but not the "test" data).

Tasks

  1. First, use pcl_viewer to inspect the point clouds and get a sense for how they look
  2. For each point cloud, count the number of objects. One way to do this is to fit the tabletop plane using PCL's RANSAC functionality, then use Euclidean clustering. But other methods are possible and may be more robust [5 points]
  3. Assuming you know that you are looking at just boxes or just cylinders that are standing "upright" (not on their side, not leaning diagonally), output estimated parameters for each object. For boxes, this means height x width x length; for cylinders this means height x radius [5 points]
  4. Assuming you know that only one type of object is on the table, but you don't know which one, can you develop a "test" to tell which is which? [5 points]

You might want to try voxelization before any other processing to reduce the size of the data if your code is running slowly.

Please submit ONE main.cpp file containing all code to Canvas. Ideally, your program will take an input .pcd file on the command line as well as a flag indicating whether mode 1, 2, or 3 above is being run, and print the required information to standard out. Please also include a README with your name (and that of your teammate) that briefly explains how you approached each task, and any issues you encountered or interesting observations you made.