$24
Show your work. Include any code snippets you used to generate an answer, using comments in the code to clearly indicate which problem corresponds to which code
Consider the following data matrix
X1
X2
X3
x1
red
yes
north
x2
blue
no
south
D = x3
yellow
no
east
x4
yellow
no
west
x5
red
yes
north
x6
yellow
yes
north
x7
blue
no
west
Answer the following:
1. (5 points) Use matplotlib to create a bar plot for the counts of the variable X2. Make sure to label the axis.
2. (2 points) Use one-hot encoding to transform all the categorical attributes to numerical values. Write down the transformed data matrix. (In what follows, we will referred to the transformed data matrix as Y).
3. (2 points) What is the Euclidean distance between instance x2 (second row) and x7 (seventh row) after applying one-hot encoding.
4. (2 points) What is the cosine similarity (cosine of the angle) between data instance x2 and data instance x7 after applying one-hot encoding?
5. (2 points) What is the Hamming distance between data instance x2 and data instance x7 after applying one-hot encoding?
6. (2 points) What is the Jaccard similarity between data instance x2 and x7 after applying one-hot encoding?
7. (2 points) What is the multi-dimensional mean of Y ?
8. (2 points) What is the estimated variance of the rst column of Y ?
9. (2 points) What is the resulting matrix after applying standard (z-score) normalization to the matrix Y . In the following, we will call this matrix Z.
10. (2 points) What is the multi-dimensional mean of Z?
11. (2 points) Let zi be the i-th row of Z. What is Euclidean distance between z2 and z7?
Acknowledgements: Homework problems adapted from assignments of Veronika Strnadova-Neeley.