Starting from:
$35

$29

Assignment 5 Solution

Instructions




Read all instructions in this section thoroughly.




Collaboration: Make certain that you understand the course collaboration policy, described on the course website. You must complete this assignment individually; you are not allowed to collaborate with anyone else. You may discuss the homework to understand the problems and the mathematics behind the various learning algorithms, but you are not allowed to share problem solutions or your code with any other students. You must also not consult code on the internet that is directly related to the programming exercise. We will be using automatic checking software to detect academic dishonesty, so please don't do it.




You are also prohibited from posting any part of your solution to the internet, even after the course is complete. Similarly, please don't post this PDF le or the homework skeleton code to the internet.




Formatting: This assignment consists of two parts: a problem set and program exercises.




For the problem set, you must write up your solutions electronically and submit it as a single PDF document. We will not accept handwritten or paper copies of the homework. Your problem set solutions must use proper mathematical formatting. For this reason, we strongly encourage you to write up your responses using LATEX. (Alternative word processing programs, such as MS Word, produce very poorly formatted mathematics.)




Your solutions to the programming exercises must be implemented in python, following the precise instruc-tions included in Part 2. Portions of the programing exercise may be graded automatically, so it is imperative that your code follows the speci ed API. Several parts of the programming exercise ask you to create plots or describe results; these should be included in the same PDF document that you create for the problem set.




Homework Template and Files to Get You Started: The homework zip le contains the README le. The les for the RL problem can be downloaded from the RL problem's website. Please read through the documentation provided in ALL les before starting the assignment.




Citing Your Sources: Any sources of help that you consult while completing this assignment (other students, textbooks, websites, etc.) *MUST* be noted in the your README le. This includes anyone you brie y discussed the homework with. If you received help from the following sources, you do not need to cite it: course instructor, course teaching assistants, course lecture notes, course textbooks or other readings.




Submitting Your Solution: You will be submitting only the following les, which you created or modi ed as part of homework 5:




README




hw5-GTACC.pdf (a PDF of your homework writeup)




imageSegmentation.py (Extra Credit Only) valueIterationAgents.py




qlearningAgents.py analysis.py




1



Please follow the naming conventions exactly, and do not submit additional les including the test scripts or data sets. Your PDF writeup of Homework 5 should be named hw5-GTACC.pdf, where \GTACC" is your own Georgia Tech account name (for example, my le would be named \hw5-bboots3.pdf").




Submission Instructions: Please place all of your les into a single folder le named hw5-GTACC and compress the folder as a zip le with the same name. Upload the zip le as an attachment on T-Square. Again, please follow the naming conventions exactly, and do not submit additional les including the test scripts or data sets. For example my le would be named \hw5-bboots3.zip"




You may resubmit multiple times, only your last submission will be counted.




Acknowledgements: The RL problem has been adapted from materials used at UCBerkeley.

PART I: PROBLEM SET




Your solutions to the problems will be submitted as a single PDF document. Be certain that your problems are well-numbered and that it is clear what your answers are. Additionally, you will be required to duplicate your answers to particular problems in the README le that you will submit.




 
K-Means (12 pts)




Show 2 iterations of the k-means algorithm (k = 2) on the following one-dimensional data set:




Data: [ 4, 1, 9, 12, 6, 10, 2, 3, 9 ]




First iteration: cluster centers (randomly chosen): 1, 6




Data assignment:




Cluster 1: f 1, 2, 3 g




Cluster 2: f 4, 9, 12, 6, 10, 9 g




 
Show the cluster centers, then the data assignments, that would be obtained for each of two more iterations.




 
After your iterations, has the algorithm converged to a solution at this point, or not? How can you tell?




 
K-Means and Variance (6 pts)




 
When using the K-Means clustering algorithm, we seek to minimize the variance of the solution. In general, what happens to the variance of a partition as you increase the value of K (the number of clusters) and why? State your answer in one sentence.




 
For a dataset with n instances, what value of k can you always get a variance of 0? Why? State your answer in one sentence.




 
Reinforcement Learning I (12 pts)




(Adapted from Sutton and Barto, Ex. 3.5) Imagine that you are designing a robot to run a maze. You decide to give it a reward of +1 for escaping the maze and a reward of zero at all other times. The task seems to break down naturally into episodes { the successive runs through the maze { so you decide to treat it as an episodic task, where the goal is to maximize expected total reward Rt = rt+1 + rt+2 + rt+3 + : : : + rT , where T is the nal time step of an episode. After running the learning agent for a while, you nd that it is showing no improvement in escaping from the maze. Something is going wrong.




Does the reward function e ectively communicate the goal to the agent? If not, can you suggest another reward function that will work? If the reward function is ne, what else is going wrong?




 
Reinforcement Learning II (Extra Credit { 15 pts)




(Adapted from Sutton and Barto, Ex. 3.10.) Consider reinforcement learning in a gridworld where rewards are positive for goals, negative for running into the edge of the world, and zero otherwise.




 
Are the signs of these rewards important, or only the intervals between them?




 
Provide a formal proof that adding a constant C to all the rewards simply adds a constant, K, to the values of all states, and thus does not a ect the relative value of any policies. In your proof, you will




nd it useful to use the following equations we studied in class for the expected discounted return and value of a state:

1
V (s) = E hRt j st = si
Rt = k=0 krt+k+1
X


(c) What is K in terms of C and gamma?




PART II: PROGRAMMING EXERCISES




 
Image Segmentation using K-Means (Extra Credit { 30 pts)




This entire problem is extra credit. If you do decide to do this project, you will apply K-Means to image segmentation. Write a program named imageSegmentation.py that reads in an image, segments that image using K-Means clustering as described below, and outputs the new segmented image. Your program must support the following command line arguments:




python imageSegmentation.py K inputImageFilename outputImageFilename




The rst argument K is an integer greater than 2 that speci es the number of clusters, inputImageFilename is the lename of the input image, and outputImageFilename is the lename to write the output image. For example, we might call your program via:




python imageSegmentation.py 24 newyorkcity.jpg nyc-segmented.jpg




Choose several nice natural images, such as a farmhouse against a blue sky, or a city scene. First write code to load the image using the Python Image Library (you might nd it useful to consult http://en. wikibooks.org/wiki/Python_Imaging_Library). The Python Image Library supports a wide variety of image le formats, and will automatically determine the letype based on the le extension. (We will test your program with .jpg and .png les, so make certain to test your program with those types.)




We can think of an image as being represented as a 3-D matrix of size imageWidth imageHeight 3. For each location in the image (i; j), the matrix contains three values for the red, green, and blue components of the pixels. We will use these pixel values for clustering. In addition to the color values (rp; gp; bp) for pixel p, we will also use the x,y coordinates (ip; jp) as features. In particular, we can represent each pixel p as a




 

ve-dimensional data vector xp = rp gp bp ip jp .




Complete the program via the following steps:




Convert the input image into a data set with ve features, as described above. To improve results, you should also standardize the values of each feature in the data set.




Implement your own version of K-Means and use it to cluster the data (i.e. the features for each pixel) into K clusters. If a cluster is ever empty during the procedure, assign a random data point to it. Use random initializations for the cluster centers, and iterate until the centroids converge.




Use the cluster centers to generate the segmented image by replacing each data point's color values

with the closest center. For example, xp becomes
C(p) C(p)


C(p)


where C(p) is the cluster to which xp




x^p =
rC(p) gC(p)
bC(p)
ip
jp
;


belongs and (r
; g


; b


) are the corresponding RGB values



of that cluster's centroid. Note speci cally that we're only replacing the color values of each instance with its centroid's colors, we're not changing the (i,j) coordinates of that instance.




Create an output image the same size as the input image. Then ll in the color values of each pixel of the image based on the x^p's. For example, x^p informs us that the pixel at (ip; jp) should have color (rC(p); gC(p); bC(p)). Note that you also have to undo the feature standardization at this point (just invert the standardization equation by solving for the original value given the standardized value).




Output the resulting image to the le outputImageFilename.




In your PDF writeup, include three di erent examples of an original image alongside the resulting segmented image.




The result of this process is called an over-segmented image. It is the rst step to building such systems as this: http://make3d.cs.cornell.edu/. Later steps would piece these segments together into objects.


 
Reinforcement Learning (70 pts)




The reinforcement learning part of this assignment is rather detailed, and has its own webpage: http:

//www.cc.gatech.edu/~bboots3/CS4641-Fall2016/RL/reinforcement.html.




























































































































































5

More products