Starting from:
$29.99

$23.99

Assignment #5 Solution

General Instructions

 

• Feel free to talk  to other  members  of the  class in doing the  homework.  You should, however, write down your solutions yourself.  List the names  of everyone you worked with at the top of your submission.

 

• Keep your solutions brief and clear.

 

• Please use Piazza if you have questions about  the homework but do not post answers.

Feel free to use private  posts or come to the office hours.

 

 

Homework Submission

 

• We DO NOT accept late homework submissions.

 

• We will be using Compass  for collecting the  homework assignments.   Please  submit your answers via Compass.  Hard copies are not accepted.

 

• Contact  the TAs if you are having technical  difficulties in submitting the assignment;

attempt to submit  well in advance of the due date/time.

 

• The homework must be submitted in pdf  format.  Scanned handwritten and/or hand- drawn pictures  in your documents  won’t be accepted.

 

• Please do not zip the answer document (PDF)  so that  the graders can read it directly on Compass.  You need to submit  one answer document,  named as hw5    netid.pdf.

 

• Please see the assignments page for more details.  In particular, we will be announcing clarifications, if any, on this page.

 

1 Query Execution (25  pts)

 

Consider the following relations  with no indexes on them:

 

• Relation R has 5,000 tuples, 100 tuples per block

 

• Relation S has 2,000 tuples, unknown number of tuples per block

 

The number of blocks in memory is 10. Say, S is as large as possible (within  the limits that main memory can afford) for the two pass sort-merge join (slide 57). That  is, if S was larger, the two-pass sort-merge join would not work.

Answer the following questions:

 

A How many tuples per block does S have?  (Do not forget to show your calculations.)

 

B  Using your answer  from A, what  is the  cost of joining R and  S using the  two  pass sort-merge join algorithm  (slide 57)?

 

C  Using your answer from A, what  is the  cost of joining R and  S using the  optimized block nested-loop join algorithm?

 

D  What  is the cost of joining R and S using a two pass hash-based  join?

 

E  Based  on questions  2, 3, and  4, explain  which variant of the  algorithm  you would choose in terms  of I/O  cost.  If multiple  algorithms  have the  same I/O  cost, explain other considerations  that  may influence your choice.

 

2 Query Optimization (35  pts)

 

Consider the relations  A(x,y,z), B(w,x), and C(u,v,w),  with the following properties:

 

A(x,y)
B(y,z)
C(z,x,u)
T(A)  = 2500

V(A, x) = 30

V(A, y) = 500
T(B)  = 1000

V(B, y) = 250

V(B, z) = 100
T(C)  = 6000

V(C, z) = 20

V(C, x) = 60

V(C, u) = 40
 

where, T(R)  = number  of tuples  in relation  R and V(R, a) = number  of distinct  values of attribute a in relation  R. Estimate the sizes (measured  in number of tuples)  of the result of the following expressions:

 

1. A × C

 

2. A ./ B

 

3. SELECT u FROM C WHERE u=20

 

4. σx = 10 and  y=30 (B ./ C)

 

3 Dynamic Programming (40  pts)

 

Consider the following relations,  where T(R)  = number of tuples in relation  R and V(R, a)

= number of distinct  values of attribute a in relation  R.

 

A(x,y)
B(y,z)
C(z,x,u)
D(u,v)
T(A)  = 2500

V(A, x) = 30

V(A, y) = 200
T(B)  = 1000

V(B, y) = 250

V(B, z) = 100
T(C)  = 6000

V(C, z) = 20

V(C, x) = 60

V(C, u) = 40
T(D)  = 2000

V(D, u) = 100

V(D, v) = 40
 

 

We want  to join all these relations  as efficiently as possible.  Determine  the  most  efficient way to  do the  join.   Clearly  state  any  assumptions  you have  made.   Show your  work by completing the following table (each step in the dynamic programming  algorithm  should be one row):

 

Subset
Size
Lowest Cost
Lowest cost plan
. . .
. . .
. . .
. . .

More products