$24
Problem 1 (2 points)
What are the two main types of attributes typically found in data?
Problem 2 (14 points)
Consider the following data matrix
X1
X2
X3
x1
0:3
23
5:6
x2
0:4
1
5:2
D = x3
1:8
4
5:2
x4
6:0
50
5:1
x5
0:5
34
5:7
x6
0:4
19
5:4
x7
1:1
11
5:5
1. (2 points) What is the estimated mean of X3?
2. (2 points) What is the estimated covariance between X1 and X3?
3. (2 points) What is the estimated multi-dimensional mean of D?
4. (2 points) What is the estimated variance of X2?
5. (2 points) What is the covariance matrix of D?
6. (2 points) What is the estimated correlation between X1 and X3?
7. (2 points) What is the total variance D?
Problem 3 (6 points)
Given a; b 2 R4 (that is a fancy way of saying that a and b are 4-dimensional vectors with real values) where
b =
15:0 2:5 4:0 4:0
a =
2:0 5:0
2:6 6:0
1.
(2 points) What is ka
bk2?
2.
(2 points) What is ka
bk1?
3.
(2 points) What is the cosine of the angle between a and b?
Problem 4 (3 points)
The following questions reference the Heart Disease data set from the UCI Machine Learning Repository:
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
1. (1 point) One attribute is named \cigs". What information is stored in the \cigs" attribute?
2. (1 point) How man rows (i.e., observations, entities, instances) are there in the data set?
3. (1 point) How man attributes are there in the data set?
Tips and Acknowledgements
Make sure to submit your answer as a PDF on Gradscope and Brightspace. Make sure to show your work. Include any code snippets you used to generate an answer, using comments in the code to clearly indicate which problem corresponds to which code.
Acknowledgements: Homework problems adapted from assignments of Veronika Strnadova-Neeley.