$24
Q1. (40 marks)
Consider the following base cuboid Sales with four tuples and the aggregate function
SUM:
Location
T ime
Item
Quantity
Sydney
2005
PS2
1400
Sydney
2006
PS2
1500
Sydney
2006
Wii
500
Melbourne
2005
XBox 360
1700
Location, T ime, and Item are dimensions and Quantity is the measure. Suppose the system has built-in support for the value ALL.
List the tuples in the complete data cube of R in a tabular form with 4 attributes, i.e., Location; T ime; Item; SUM(Quantity)?
Write down an equivalent SQL statement that computes the same result (i.e., the cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.
Consider the following ice-berg cube query:
SELECT Location, Time, Item, SUM(Quantity)
FROM Sales
CUBE BY Location, Time, Item
HAVING COUNT(*) 1
Draw the result of the query in a tabular form.
Assume that we adopt a MOLAP architecture to store the full data cube of R, with the following mapping functions:
8
1 if x = ‘Sydney’;
<
fLocation(x) = 2 if x = ‘Melbourne’;
:0 if x = ALL:
8
1 if x = 2005;
<
fT ime(x) = 2 if x = 2006;
:0 if x = ALL:
1
2 DUE ON 23:59 14 APR, 2019 (SUN)
8
if x = ‘PS2’;
<2 if x = ‘XBox 360’;
fItem(x) =
3 if x = ‘Wii’;
:0 if x = ALL:
Draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form of (ArrayIndex; V alue). You also need to write down the function you chose to map a multi-dimensional point to a one-dimensioinal point.
Q2. (30 marks)
Consider binary classi cation where the class attribute y takes two values: 0 or 1. Let the feature vector for a test instance be a d-dimension column vector ~x. A linear classi er with the model parameter w (which is a d-dimension column vector) is the following function:
(
y = 1 , if wx 0
, otherwise.
We make additional simplifying assumptions: x is a binary vector (i.e., each dimension
of x take only two values: 0 or 1).
Prove that if the feature vectors are d-dimension, then a Na ve Bayes classi er is a linear classi er in a d + 1-dimension space. You need to explicitly write out the vector w that the Na ve Bayes classi er learns.
It is obvious that the Logistic Regression classi er learned on the same training dataset as the Na ve Bayes is also a linear classi er in the same d + 1-dimension space. Let the parameter w learned by the two classi ers be wLR and wNB, re-spectively. Brie y explain why learning wNB is much easier than learning wLR.
i
log x
i
=
i
i
Hint .1 log
P
x
Q
Q3. (30 marks)
We have a sample of mixture of two chemical compound, S1 and S2. The (unknown) percentages of each chemical in the sample are denoted as q1 and q2 (whereas q1 + q2 = 1), respectively.
We have a device that can detect the percentages of m = 3 di erent components that are contained in both chemical compounds, albeit with di erent percentages. We denote the components as f Oj gmj=1. We list the percentages of each components in pure Sis in the following table:
pi;j
O1
O2
O3
S1
0.1
0.2
0.7
S2
0.4
0.5
0.1
After measuring the three components, we obtain their percentages as f uj gmj=1.
COMP9318 (19T1) ASSIGNMENT 1
3
Write out the log likelihood function (as a function of qi, pi;j, and ui).
If u1 = 0:3; u2 = 0:2; u3 = 0:5, what are the MLE of q1 and q2? What are the expected percentage of each component under a model with the MLE parameters?
Submission
Please write down your answers in a le named ass1.pdf. You must write down your name and student ID on the rst page.
You can submit your le by
give cs9318 ass1 ass1.pdf
Late Penalty. -10% per day for the rst two days, and -20% for each of the following days.