Homework #2 Solution

Starting from:

$35

Setup
The code for this assignment has been pushed to the same repository that you forked for the previous assignment. That means all you need to do is sync your repository with the main repository. There are multiple ways to do this, perhaps the easiest way is to login to the BitBucket website, go to the page for your repository, and click the "Sync Now" link on the right side of the screen. You can also use eclipse or the command line to perform a pull from upstream, if you prefer.

Please note that this assignment builds off of the previous assignment. In particular, you will need a working Catalog class for this assignment to work. The other classes are used a bit, but are not as important. If you did not complete the previous assignment, or are concerned that your implementation is significantly flawed, please speak with me as soon as possible so that we can figure out a solution.
Overview
In this assignment, you will implement relational operations as discussed in class. Your first task will be to implement the operations themselves. You will then use a parser to help translate SQL queries into these relational operations, effectively allowing users to query the data stored on your server.
Relations
Your first task is to complete the Relation and Aggregator classes. Relation is a class that performs relational algebra operations on the data in your database. Examine the methods and come up with a design before beginning your implementation.

Some implementation hints for Relation:

* Notice that each relational algebra operation returns a Relation. This will allow us to chain together many operations to form more complex queries.

* Select operations will all take the form: column (operation) constant. An example WHERE clause might be WHERE a1 = 530.

* Something to consider for each operation is how the operation will affect the number and type of columns in the resulting relation (if this is affected at all). Be sure to create a new TupleDesc if necessary and use this when creating new Tuples for the result.

* The join operation should be an inner join, there is no need to worry about outer joins for this assignment. You may also assume that all join conditions will use equals (i.e. not greater than or less than).

* In class we discussed how relations consist of a set of tuples, however you'll notice that for this assignment we are using an ArrayList instead of a set. This means that duplicates may exist in the result set, but that should be considered acceptable behavior for the purpose of this assignment.

* Be aware that some relational operations may result in a relation that contains no tuples (the empty set). You should be able to accommodate these situations, and the resulting Relation should still have the proper TupleDesc even if it does not contain any tuples.

Some implementation hints for Aggregator:

* Aggregates can get complicated, so for the purposes of the assignment, we are going to restrict what kinds of aggregate queries are possible. There are two types of aggregates to consider: aggregates with GROUP BY and aggregates without GROUP BY. For aggregates without GROUP BY, you can assume that the relation being aggregated has a single column (the column that is being aggregated). For aggregates that include GROUP BY you can assume that the relation will have exactly two columns: the first column will be the column containing the groups, and the second column will contain the data to be aggregated.

* Note that not all aggregation operations apply to all data types.

* The Aggregator class has two primary methods. Merge accepts one tuple at a time and updates the aggregate accordingly, and getResult returns the result of the aggregation. This design should greatly simplify the aggregate method in the Relation class.

* You do not need to change the column name of any columns involved in an aggregate. Once you have finished these two classes, your code should pass the Relation tests.
Queries
Next, we will shift our focus to queries. Parsing queries is difficult, so you will not be tasked with doing that. Instead, we will use an open source parser called JSQLParser.

You can find the API for this parser here.

If you examine the Query class, you'll notice that query execution has already been started for you. The code given creates a parse tree that breaks a SQL query down into its component parts. This is not the same thing as a query tree, in fact the code that you will be writing will have a similar effect as a query tree. Your task is to get the necessary information out of the parse tree and use it in combination with your Relation class to execute the query.

Processing a parse tree often utilizes a design pattern called the visitor pattern. While you are not tasked with creating any visitors yourself, you will notice that a few visitor classes have been provided to you that you will need to use. As an example, consider ColumnVisitor. Once you have used the API to extract the column information from the parse tree, create a new ColumnVisitor for each column node, then call accept() on each node, passing the visitor in as a parameter. You can then ask the ColumnVisitor to tell you the name of the column contained in the node.

For example, if you have a variable called node that contains column information, you might do something like this:
ColumnVisitor cv = new ColumnVisitor();
node.accept(cv);
System.out.println(cv.getColumn()); //should print the column name retrieved from the node

Some implementation hints for Query:

* The API is your friend. Take a look at what classes are being referred to in the visitor classes to get an idea of what classes to look into in the API. In particular, you will probably want to use the following classes from the parse tree: Select, PlainSelect, Table, Join, Expression, EqualsTo, SelectItem.

* You are responsible for the following SQL keywords: SELECT, FROM, WHERE, GROUP BY, aggregate functions (MIN, MAX, AVG, COUNT, SUM), AS (for columns only, not tables).

* You will need to use the Catalog to access tables.

* The only method that you should need to modify for this part is the execute method in the Query class. You should definitely examine the three visitor classes, but there is no need to modify them.

* You should never have to instantiate the AggregateExpressionVisitor class. You'll notice that it is used within the ColumnVisitor class (which you will have to use, along with the WhereExpressionVisitor class).

* Pay attention to order of operations! They definitely matter.

* The queries you are provided will follow the same rules for aggregates laid out in the previous section.

* Casting is acceptable, but should be used sparingly, as it is a bit dangerous. Only cast if you are absolutely certain about the type you are casting to!

* You do need to handle the all columns (*) operator in the SELECT clause.

* You should be able to handle queries that join more than two tables together.
Testing
You are required to write some unit tests for this assignment. We will discuss unit tests in class on February 7th. You will then be expected to submit at least 2 unit tests by noon on Monday, February 11th. These unit tests will then be collected and distributed so that you can use them as you complete the assignment. These tests will also be used to grade your assignment.
Rubric
The below rubric is intended as a guideline and will be followed as closely as possible, however submissions that deviate too much from the specification may be graded using an alternate rubric.

Relation (55 points): select (10 points), project (10 points), rename (5 points), join (15 points), aggregate (5 points), toString (5 points), constructor and getters and setters (5 points)

Aggregate (15 points): merge (5 points), everything else (5 points)

Query (30 points): SELECT (5 points), FROM (5 points), WHERE (5 points), JOIN (5 points), Aggregates (5 points), AS (5 points)

Unit tests: 5 points. I reserve the right to deduct points for code that is poorly formatted or poorly documented.

Total: 100 points
Submission
To submit your homework, simply commit and push your code to your repository by the due date. Be sure to put your names at the top of each file. Any submissions that are committed and pushed after the due date will be discarded.
Acknowledgements
This lab was mostly created by me, however since it is based on hw1 which was taken from MIT OpenCourseWare, it is still appropriate to provide attribution to them:

Sam Madden, Robert Morris, Michael Stonebraker, Carlo Curino, 6.830 / 6.814 Database Systems, Fall 2010. (MIT OpenCourseWare: Massachusetts Institute of Technology), https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-830-database-systems-fall-2010/index.htm (Accessed 9/11/17). License: Creative commons BY-NC-SA

More products

CS Lab 06: Exploratory Data Analysis (EDA) with Clustering Solution

$30

Buy now

CS Lab 05: Exploratory Data Analysis (EDA) with Dimensionality Reduction Solution

$30

Buy now

CS Lab 04: Data App Solution

$30

Buy now