Starting from:
$30

$24

CS528 Homework 2 Solved

Laplace Mechanism (15 points)



Using the same dataset as Homework 1: https://archive.ics.uci.edu/ml/datasets/Adult. Query the average age of the records with age greater than 25. Inject Laplacian noise to the query result (average age) to ensure 0:5-di erential privacy and 1-di erential privacy.




Tasks:




Derive the global sensitivity for the query (average age). (5 points)



Calculate the variance for the Laplace noise and inject noise into the average age query result with 0:5-di erential privacy and 1-di erential privacy, respectively. (10 points)



Submission Part I: (1) a report including the screenshots of the major steps of each task and your answers (named as hw2-report.pdf ), and (2) source code les { all named with the pre x \hw2-1-" (e.g., hw2-1-laplace.java).




2. Exponential Mechanism (15 points)




Using the same dataset as Homework 1: https://archive.ics.uci.edu/ml/datasets/Adult. Query the most frequent \Education" result. Design an exponential mechanism (random-ized) to ensure -di erential privacy for the query. Repeat all the procedures for Exponential mechanism ( = 0:5 and = 1):




Tasks:




Derive the global sensitivity for the query (most frequent \Education"). (5 points)



Compute the probabilities and generate the noisy output result with 0:5-di erential privacy and 1-di erential privacy, respectively. (10 points)



Submission Part II: (1) include the screenshots of the major steps of each task and your answers in the same report as above hw2-report.pdf, and (2) source code les { all named with the pre x \hw2-2-" (e.g., hw2-2-exponential.java).

 Illinois Institute of Technology - CS528 Data Privacy and Security













3. Di erentially Private Classi cation (35 points)




Considering the application of classifying the \iris plant" using the following dataset:




The full dataset and description are available at: https://archive.ics.uci.edu/ml/datasets/Iris.



Four attributes and three classes (using iris.data).



sepal length in cm



sepal width in cm



petal length in cm



petal width in cm



class: Iris Setosa, Iris Versicolour, and Iris Virginica



It is a small-scale dataset (150 records). Consider records (1-10, 51-60, 101-110) as the testing data for prediction, and the remaining 120 records as the training data.



Tasks:




Build a Naive Bayes Classi er to predict the classes for records 1-10, 51-60, and 101-



110 (both training and prediction). See more information about Naive Bayes Classi er: https://en.wikipedia.org/wiki/Naive Bayes classi er. (5 points)




Design and implement a di erentially private algorithm (satisfying -di erential pri-vacy) to train the Naive Bayes Classi er for prediction. (15 points)



Hints: you can consider the Laplace Mechanism and allocate budgets to ensure - di erential privacy for the algorithm. The algorithm consists of many queries, and you can think about sequential composition and parallel composition for the queries.




Prove -di erential privacy for your designed algorithm. (5 points)






Set = 0:5; 1; 2; 4; 8, and 16. Then, calculate the precision and recall of the prediction results (of records 1-10, 51-60, and 101-110) generated by the di erentially private classi er. Note that the true results of the 30 records are given in the original dataset for benchmarking. (10 points)



Submission Part III: (1) include the proof for di erential privacy, screenshots of the procedures and results in the same report as above hw2-report.pdf, and (2) source code les { all named with the pre x \hw2-3-" (e.g., hw2-3-classi er.java, and hw2-3-dpclassi er.java).




4. Local Di erential Privacy (35 points)




Using the same dataset as Homework 1: https://archive.ics.uci.edu/ml/datasets/Adult. The server tries to learn the distribution of the users’ ages (each record is privately held by one user). However, each user doesn’t trust the server and locally perturbs its age at the client side with the privacy bound .




Tasks:

 Illinois Institute of Technology - CS528 Data Privacy and Security













Implement the following two LDP protocols: Unary Coding, and Generalized Random Response for LDP. Note that each LDP protocol includes both private data collection via local randomization and the frequency estimation performed on the noisy data (by the data aggregator/server). It is unnecessary to implement the communication protocol for 48k+ users/clients and server (simulating the local randomization algorithm for all the users and the frequency estimation for the server would be ne). (15 points)



Compare these protocols’ accuracy with di erent and discuss your ndings. The parameter varies from 1 to 10 with a step of 1. You can measure the relative error in age frequency estimation by calculating the L1-distance between the actual and estimated frequencies. You can plot the results with x axis and y axis the L1-distance. (10 points)



Compare these protocols’ accuracy with di erent numbers of users n and discuss your ndings with a xed = 2. The number of users n varies from 10% to 100% of all the users with a step of 10%. You can measure the relative error in age frequency estimation by calculating the L1-distance between the actual and estimated frequencies. You can plot the results with x axis n and y axis the L1-distance. (10 points)



Submission Part IV: (1) include the screenshots of the procedures and results in the same report as above hw2-report.pdf (you can explain your ndings with tables or gures), and (2) source code les { all named with the pre x \hw2-4-" (e.g., hw2-4-unary.java).







































































































3

More products