Starting from:
$30

$24

Programming Assignment 3: Incremental Bayesian Learning S0lution

Introduction




Incremental learning is a method of machine learning in which input data is continuously used to extend the existing model's knowledge. It can be applied when training data becomes available gradually over time or its size is out of system memory limits. In this assignment, you will implement a Naive Bayes classifier which can learn incrementally (i.e. without seeing all the instances at once).




1. Homework Description




Please use ​Python 3 ​to implement your homework assignment.




1.1 Blackbox31




In this assignment, you are given ​Blackbox31.pyc​instead of ready-made data, and you need to generate your own data from that blackbox to simulate data stream.




Example of getting data from ​Blackbox31.pyc, ​in your​ NaiveBayes.py:














































Each time you ​ask​the blackbox, it will randomly return a tuple (X, y) back. As you can see in the above image, X is a list and y is an integer, together X and y form ​one​single sample.
Observations ​X​has 3 attributes, all attributes ​have continuous value and are in the range [0.0, 1.0), and target ​y​has 3 possible values: ​0, 1 and 2​.




Your python script submitted to Vocareum ​must​​import​​both​blackbox31 and blackbox32 even though blackbox32 is not provided to you.










Remember that your code will be run with 2 different boxes. One way to parse the input argument is



















and then use ​bb​as data source. When you develop your algorithm, you only need to care about blackbox31, but you also need to import and add additional parsing logic for blackbox32 since blackbox32 will be used to test your code on Vocareum.




1.2 Noisy data




This time the data includes noises, that means even the features of a sample(X) shows that the sample should belongs to class C​i(y), it might still return a wrong class C​jwhen you ask the blackbox. However, Bayesian model is not quite sensitive to noises, so that should not be a big program (hopefully). You are also encouraged to try other models like decision tree and Neural Network to compare the differences.







1.3 Task Description




Your task is ​to classify this categorical data into ​3​classes ​incrementally​and keep track of the testing accuracy statistics.




You need to hold out 200 samples as testing data, and then repeatedly generate training data one at a time during the learning process for 1000 times. Note that not all training data examples are known at once, it comes one at a time.




The logic of your program should be something like this:




hold out 200 examples to estimate accuracy Repeat 200 times:



Ask the blackbox for one data point




do incremental training



Repeat 1000 times:




X, y = blackbox31.ask()




adapt model to this new X, y




accumulate test accuracy stats per 10 samples




# output accuracy stats







Your program will be run in the following way:




python3 NaiveBayes.py blackbox31




= results_blackbox31.txt




When we grade your model with hidden blackbox32, it should be run:




python3 NaiveBayes.py blackbox32




= results_blackbox32.txt




The ​results.txt ​contains accuracy stats, it should have the following format:




10, 0.545




20, 0.545




30, 0.595




40, 0.66




50, 0.66









970, 0.81




980, 0.82




990, 0.82




1000, 0.815




The first column indicates the number of training data that have been seen so far, and the second column is the corresponding test accuracy (rounded to 3 decimals).




Since the data will be randomly generated, it is acceptable that sometimes your classifier does not give very ideal results and you will not lose points for that.




In your implementation, ​please do not use any existing machine learning library call​. You must implement the algorithm yourself. Please develop your code yourself and do not copy from other students or from the Internet.




When we grade your algorithm, we will use a different blackbox. ​Your code will be autograded for technical correctness. Please name your file correctly, or you will wreak havoc on the autograder. ​The maximum running time is 3 minutes.




2. Submission:




2.1 Submit your code to Vocareum

Submit ​NaiveBayes.py ​to​ ​Vocareum



After your submission, Vocareum would run two scripts to test your code, a submission script and a grading script. The submission script will test your code with ​only blackbox31​, while the grading script will test your code with another ​blackbox32​.



The program will terminate and fail if it exceeds the ​3 minutes​time limit.



After the submission script/grading script finishes, you can view your submission report immediately to see if your code works, while the grading report will be released after the deadline of the homework.



You don’t have to keep the page open while the scripts are running.









2.2 Submit your report to Blackboard

Create a single ​.zip​(​Firstname_Lastname_HW3.zip​) which contains:



NaiveBayes.py



Report.pdf​, a ​brief​report contains your​testing accuracy graph ​for blackbox31​.​Example graph:






























































Submit your zipped file to the blackboard.









Rubric:



100 points in total




Program correctness(60 points): program always works correctly and meets the specifications within the time limit
Documentation(20 points): code is well commented and submitted graph is reasonable



Performance (20 points): the classifier has reasonable performance

More products