ASSIGNMENT 1 SOLUTION

Starting from:

~~$30~~

$24

Home

Q1. (40 marks)

Consider the following base cuboid Sales with four tuples and the aggregate function

SUM:

Location
T ime
Item
Quantity

Sydney
2005
PS2
1400
Sydney
2006
PS2
1500
Sydney
2006
Wii
500
Melbourne
2005
XBox 360
1700

Location, T ime, and Item are dimensions and Quantity is the measure. Suppose the system has built-in support for the value ALL.

List the tuples in the complete data cube of R in a tabular form with 4 attributes, i.e., Location; T ime; Item; SUM(Quantity)?

Write down an equivalent SQL statement that computes the same result (i.e., the cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.

Consider the following ice-berg cube query:

SELECT Location, Time, Item, SUM(Quantity)

FROM Sales

CUBE BY Location, Time, Item

HAVING COUNT(*) 1

Draw the result of the query in a tabular form.

Assume that we adopt a MOLAP architecture to store the full data cube of R, with the following mapping functions:

8

1 if x = ‘Sydney’;

<

fLocation(x) = 2 if x = ‘Melbourne’;

:0 if x = ALL:

8

1 if x = 2005;

<

fT ime(x) = 2 if x = 2006;

:0 if x = ALL:

1

2 DUE ON 23:59 14 APR, 2019 (SUN)

8

if x = ‘PS2’;

<2 if x = ‘XBox 360’;

fItem(x) =

3 if x = ‘Wii’;

:0 if x = ALL:

Draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form of (ArrayIndex; V alue). You also need to write down the function you chose to map a multi-dimensional point to a one-dimensioinal point.

Q2. (30 marks)

Consider binary classi cation where the class attribute y takes two values: 0 or 1. Let the feature vector for a test instance be a d-dimension column vector ~x. A linear classi er with the model parameter w (which is a d-dimension column vector) is the following function:

(

y = 1 , if wx 0

, otherwise.

We make additional simplifying assumptions: x is a binary vector (i.e., each dimension

of x take only two values: 0 or 1).

Prove that if the feature vectors are d-dimension, then a Na ve Bayes classi er is a linear classi er in a d + 1-dimension space. You need to explicitly write out the vector w that the Na ve Bayes classi er learns.

It is obvious that the Logistic Regression classi er learned on the same training dataset as the Na ve Bayes is also a linear classi er in the same d + 1-dimension space. Let the parameter w learned by the two classi ers be wLR and wNB, re-spectively. Brie y explain why learning wNB is much easier than learning wLR.

i
log x
i
=
i
i
Hint .1 log

P
x

Q

Q3. (30 marks)

We have a sample of mixture of two chemical compound, S1 and S2. The (unknown) percentages of each chemical in the sample are denoted as q1 and q2 (whereas q1 + q2 = 1), respectively.

We have a device that can detect the percentages of m = 3 di erent components that are contained in both chemical compounds, albeit with di erent percentages. We denote the components as f Oj gmj=1. We list the percentages of each components in pure Sis in the following table:

pi;j
O1
O2
O3
S1
0.1
0.2
0.7
S2
0.4
0.5
0.1
After measuring the three components, we obtain their percentages as f uj gmj=1.

COMP9318 (19T1) ASSIGNMENT 1
3

Write out the log likelihood function (as a function of qi, pi;j, and ui).
If u1 = 0:3; u2 = 0:2; u3 = 0:5, what are the MLE of q1 and q2? What are the expected percentage of each component under a model with the MLE parameters?

Submission

Please write down your answers in a le named ass1.pdf. You must write down your name and student ID on the rst page.

You can submit your le by

give cs9318 ass1 ass1.pdf

Late Penalty. -10% per day for the rst two days, and -20% for each of the following days.