Starting from:
$35

$29

Final project --- Simple DBMS query optimization Solution

1 Overview

In this homework, you are asked to implement equal​ join​and optimize query

of the simpleDBMS​​,this project is under Apache 2.0 License, the detail of this project

refer to its GitHub​ Repo​. The​ expected result of this homework is to support the below four type of queries:

1.1 Select


select <​field​> from <​table1​> ​[​join <​table2​> on <​field in table1​> = <​field in table2​>​]​​[​where <​conditions​>​]​​[​offset <​offset_num​>​] [​limit <​limit_num​>​]

    • field has​ two types, one is attributes in the table such as id​, name​, email​, age​, id1​,id2​and a special case *​, another is aggregation function of an attribute, which can be sum()​, avg()​and count()​.Don't need to specify the table name in attributes

    • table1​can be user​or like​

    • table2​will only be like​

        ◦ if there is a join query, table1​​and table2​​will be the user​ table​ and the like​table respectively only.
    • conditions​mean the search condition of the query

        ◦ Only need to accept attributes in user​​table

        ◦ The conditions will be space splittable string, and there​ is no​ extra

bonus​for string parsing as HW3​,so you can ignore the string​_bonus system test in this project​.
        ◦ The attribute name will only appear at the left-hand side of the operator, and the right-hand side of the operator will be a scalar or a string, i.e.
            ▪ age > 20

            ▪ name = “user1”

        ◦ There will not exist conditions only contain scalars or strings, such as

            ▪ 3 > 5

            ▪ “123” != “345”

    • offset_num &​​limit_num​are numeric input, they should always be unsigned int

1.2 Update

update ​user​set <​field​> = <​content​> ​[​where ​<conditions>​]

    • only need to support updating on user​​table

1.3 Delete


delete from ​user​​[​where ​<conditions>​]

    • only need to support delete operation on user​​table

1.4 Insert


insert into ​user​<​id​> <​name​> <​email​> <​age​> insert into ​like​<​id1​> <​id2​>

    • Must support insert operation on user​​and like​

    • The id1 in like​​table is the primary​ key​,and the number of records in like​​table will be smaller than user​​table, the insert queries in test cases to like​​table will not have duplicated id1 in each test file, so you don't need to handle primary constraint in like​​table, but you can still do it if you want to.


    • Prerequisite

        ◦ Installation of simpleDBMS

            ▪ There is no official version simpleDBMS which is implemented to fit all HW3 tasks this time
            ▪ To get the newest simpleDBMS with latest test cases, you just need to

visit its GitHub​​page, clone or download it.

        ◦ C programming language

            ▪ You can also use C++ to implement this project, but it needs to pass all the system tests without any modification on test code, as well
            ▪ If you rewrite this project using C++, please add some comments in README to let TA know your project needs to be compiled by g++ 7.4.0

        ◦ Understanding of simpleDBMS



    • Tasks

        ◦ Equal join (40%)

        ◦ Query optimization (20%)

        ◦ Backward compatibility (40%)

3.1 Equal join (40%)
In this task, you need to implement equal join on 'user'​and 'like'​, which means the join condition will only be equality of two fields. And the join conditions will only be id​ = id1 or​ id​ = id2​. You can ignore the ordering of the join result, as we will use count()​to test for your join result only.

Example:

user

id
name
email
age




1
user1
user1@example.com
20




2
user2
user2@example.com
21




3
user3
user3@example.com
22




4
user4
user4@example.com
23




like

id1

1

2


id2


4

3


If we execute the following query:

db > select count(*) from user join like on id = id1

where age > 20

The execution result is:

(1)

3.2 Query optimization (20%)

There is a new script(generate_testcase.py) for you to generate test cases for

query optimization task.

Usage:

python test/system/generate_testcase.py [-h] [--user_num USER_NUM]

The generated test case file will be different every time, we will also use this script for the final scoring.

Scoring method:

    • Correctness: 5%

        ◦ 1% for each test case

    • Execution time: 15%

        ◦ We will run your program about 1~3 times and sum up the total execution time
    • Rank by the total execution time, the scores will be given according to the ranking list
        ◦ Assume there are 40 groups

        ◦ Scores = 15 * (41 - Rank) / 40

        ◦ 1st gets 15 points, 3rd gets 14.25 points

(We will use Python time_ns() or time() in the time module to measure the execution time of your program)


3.3 Backward compatibility (40%)

You need to support all the function you have done in HW2 and HW3, except the .​load​built-in command. Also, the test case in HW2 and HW3 will​ not use 'table'​ anymore​,we will use 'user'​instead.


4 Testing

4.1 System tests (For above requirement)

The system tests focus on system level behavior, and these tests are also used to test the new features you need to implement

Use the following command to execute system tests

    • $​ python test/system/system_test.py ./shell [test_case [test_case ...]]

        ◦ Execute this command in the root folder of this project

        ◦ By default, if no test_cases are given, it will run all test cases, otherwise, it will execute all specified test cases.
        ◦ To see the list of test cases, please check test/system/testcases​folder
    • $ python test/system/system_test.py [-h]

        ◦ Show the usage of system test

    • You can check the difference of your test output with the expected answer by comparing two files in these two paths “test/system/output/<test_case>/<output file>” “test/system/answer/<test_case>/<answer_file>”

        ◦ Hint: 1) diff​command can show the difference of two files

        ◦ 2) Install vim​and using vimdiff​provide a better looking


4.2 Environment

We will run your test using the following environment setting:

    • 4 GB RAM
    • 1 CPU core

    • No GPU

    • No 3rd party packages


    • Requirements:

        ◦ Pass provided system tests

        ◦ Makefile

            ▪ The original project already contains a Makefile, your final submission should also be able to compile the shell using Makefile

Passing system tests only means that your program reaches the basic requirements. There are some hidden tests we do not provide and we will test them on your submission. Therefore, make sure your program works well on query with different combinations of features.

6 Submission

6.1 E3

    • Compress your whole project into one single zip file named `<group_ID>.zip` (like 01.zip), and upload to New E3 ​by only one of your group member

    • Make sure your submission contains a ​Makefile ​and the result of make command will generate a binary file named ​shell

    • If your submission couldn’t be compiled by make cmd, or we can’t find the ​shell ​file as the compiled binary, you will get 0 score

01.zip (group_ID.zip

|--include

    • `--​*.h​(all the header files |--src

    • `--​*.c​(all the source files `--​Makefile


6.2 Deadline

    • You are required to hand in your homework before 2019/06/23​ 23:59

    • Late submission is not allowed


7 Discussion Forum
Final project discussion <HackMD>


Plagiarism is not allowed, you will get 0 points when we found that happened.

More products