Starting from:

$35

HW1-1. Data Preprocessing Solution



a. What is the main difference between sampling and Feature selection? What is the main similarity between them?

b. What is the main difference between feature selection and dimensionality reduction? What is the main similarity between them?

c. Given a number x = 480 in the range of [-100, 9990], we need to normalize and project the number into a new range [-1, 1]. What is the new value of x if we use decimal scaling for normalization? What is the new value of x if we use min-max normalization?


HW1-2. You are given a set of m objects that is divided into K groups, where the ith group is of size mi. If the goal is to obtain a sample of size n < m, what is the difference between the following two sampling schemes? (Assume sampling with replacement.)
    (a) We randomly select n ∗ mi/m elements from each group.

    (b) We randomly select n elements from the data set, without regard for the group to which an object belongs.



HW1-3. Sampling

Given a set of data consisting of a small number of almost equal sized groups, find at least one representative point for each of the groups. Assume that the objects in each group are highly similar to each other, but not very similar to objects in different groups.

    (a) Assume we have 10 independent groups, provide a formula to estimate the probability that there is at least one object from each of 10 groups.

    (b) Plot the probability under different sample sizes.

More products