Assignment #5 Solution

Starting from:

~~$29.99~~

$23.99

Home

General Instructions

• Feel free to talk to other members of the class in doing the homework. You should, however, write down your solutions yourself. List the names of everyone you worked with at the top of your submission.

• Keep your solutions brief and clear.

• Please use Piazza if you have questions about the homework but do not post answers.

Feel free to use private posts or come to the office hours.

Homework Submission

• We DO NOT accept late homework submissions.

• We will be using Compass for collecting the homework assignments. Please submit your answers via Compass. Hard copies are not accepted.

• Contact the TAs if you are having technical difficulties in submitting the assignment;

attempt to submit well in advance of the due date/time.

• The homework must be submitted in pdf format. Scanned handwritten and/or hand- drawn pictures in your documents won’t be accepted.

• Please do not zip the answer document (PDF) so that the graders can read it directly on Compass. You need to submit one answer document, named as hw5 netid.pdf.

• Please see the assignments page for more details. In particular, we will be announcing clarifications, if any, on this page.

1 Query Execution (25 pts)

Consider the following relations with no indexes on them:

• Relation R has 5,000 tuples, 100 tuples per block

• Relation S has 2,000 tuples, unknown number of tuples per block

The number of blocks in memory is 10. Say, S is as large as possible (within the limits that main memory can afford) for the two pass sort-merge join (slide 57). That is, if S was larger, the two-pass sort-merge join would not work.

Answer the following questions:

A How many tuples per block does S have? (Do not forget to show your calculations.)

B Using your answer from A, what is the cost of joining R and S using the two pass sort-merge join algorithm (slide 57)?

C Using your answer from A, what is the cost of joining R and S using the optimized block nested-loop join algorithm?

D What is the cost of joining R and S using a two pass hash-based join?

E Based on questions 2, 3, and 4, explain which variant of the algorithm you would choose in terms of I/O cost. If multiple algorithms have the same I/O cost, explain other considerations that may influence your choice.

2 Query Optimization (35 pts)

Consider the relations A(x,y,z), B(w,x), and C(u,v,w), with the following properties:

A(x,y)
B(y,z)
C(z,x,u)
T(A) = 2500

V(A, x) = 30

V(A, y) = 500
T(B) = 1000

V(B, y) = 250

V(B, z) = 100
T(C) = 6000

V(C, z) = 20

V(C, x) = 60

V(C, u) = 40

where, T(R) = number of tuples in relation R and V(R, a) = number of distinct values of attribute a in relation R. Estimate the sizes (measured in number of tuples) of the result of the following expressions:

1. A × C

2. A ./ B

3. SELECT u FROM C WHERE u=20

4. σx = 10 and y=30 (B ./ C)

3 Dynamic Programming (40 pts)

Consider the following relations, where T(R) = number of tuples in relation R and V(R, a)

= number of distinct values of attribute a in relation R.

A(x,y)
B(y,z)
C(z,x,u)
D(u,v)
T(A) = 2500

V(A, x) = 30

V(A, y) = 200
T(B) = 1000

V(B, y) = 250

V(B, z) = 100
T(C) = 6000

V(C, z) = 20

V(C, x) = 60

V(C, u) = 40
T(D) = 2000

V(D, u) = 100

V(D, v) = 40

We want to join all these relations as efficiently as possible. Determine the most efficient way to do the join. Clearly state any assumptions you have made. Show your work by completing the following table (each step in the dynamic programming algorithm should be one row):

Subset
Size
Lowest Cost
Lowest cost plan
. . .
. . .
. . .
. . .