Starting from:
$30

$24

ASSIGNMENT 1 SOLUTION

Q1. (40 marks)




Consider the following base cuboid Sales with four tuples and the aggregate function




SUM:




Location
T ime
Item
Quantity








Sydney
2005
PS2
1400
Sydney
2006
PS2
1500
Sydney
2006
Wii
500
Melbourne
2005
XBox 360
1700











Location, T ime, and Item are dimensions and Quantity is the measure. Suppose the system has built-in support for the value ALL.




List the tuples in the complete data cube of R in a tabular form with 4 attributes, i.e., Location; T ime; Item; SUM(Quantity)?



Write down an equivalent SQL statement that computes the same result (i.e., the cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.



Consider the following ice-berg cube query:



SELECT Location, Time, Item, SUM(Quantity)




FROM Sales




CUBE BY Location, Time, Item




HAVING COUNT(*) 1




Draw the result of the query in a tabular form.




Assume that we adopt a MOLAP architecture to store the full data cube of R, with the following mapping functions:



8

1 if x = ‘Sydney’;

<

fLocation(x) = 2 if x = ‘Melbourne’;



:0 if x = ALL:

8

1 if x = 2005;

<

fT ime(x) = 2 if x = 2006;



:0 if x = ALL:




1



2 DUE ON 23:59 14 APR, 2019 (SUN)




8

if x = ‘PS2’;






<2 if x = ‘XBox 360’;

fItem(x) =

3 if x = ‘Wii’;








:0 if x = ALL:




Draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form of (ArrayIndex; V alue). You also need to write down the function you chose to map a multi-dimensional point to a one-dimensioinal point.




Q2. (30 marks)




Consider binary classi cation where the class attribute y takes two values: 0 or 1. Let the feature vector for a test instance be a d-dimension column vector ~x. A linear classi er with the model parameter w (which is a d-dimension column vector) is the following function:

(

y = 1 , if wx 0

, otherwise.



We make additional simplifying assumptions: x is a binary vector (i.e., each dimension




of x take only two values: 0 or 1).




Prove that if the feature vectors are d-dimension, then a Na ve Bayes classi er is a linear classi er in a d + 1-dimension space. You need to explicitly write out the vector w that the Na ve Bayes classi er learns.




It is obvious that the Logistic Regression classi er learned on the same training dataset as the Na ve Bayes is also a linear classi er in the same d + 1-dimension space. Let the parameter w learned by the two classi ers be wLR and wNB, re-spectively. Brie y explain why learning wNB is much easier than learning wLR.




i
log x
i
=
i
i
Hint .1 log


P
x












Q



Q3. (30 marks)




We have a sample of mixture of two chemical compound, S1 and S2. The (unknown) percentages of each chemical in the sample are denoted as q1 and q2 (whereas q1 + q2 = 1), respectively.




We have a device that can detect the percentages of m = 3 di erent components that are contained in both chemical compounds, albeit with di erent percentages. We denote the components as f Oj gmj=1. We list the percentages of each components in pure Sis in the following table:

pi;j
O1
O2
O3
S1
0.1
0.2
0.7
S2
0.4
0.5
0.1
After measuring the three components, we obtain their percentages as f uj gmj=1.




COMP9318 (19T1) ASSIGNMENT 1
3



Write out the log likelihood function (as a function of qi, pi;j, and ui).
If u1 = 0:3; u2 = 0:2; u3 = 0:5, what are the MLE of q1 and q2? What are the expected percentage of each component under a model with the MLE parameters?



Submission




Please write down your answers in a le named ass1.pdf. You must write down your name and student ID on the rst page.




You can submit your le by




give cs9318 ass1 ass1.pdf




Late Penalty. -10% per day for the rst two days, and -20% for each of the following days.

More products