$24
In 1988, Fabian Pascal, a database designer and programmer (and proli c blogger on database issues, see http://www.dbdebunk.com, published the article "SQL Redundancy and DBMS Performance" in the journal Database Programming & Design (V1, N12). He compared and discussed the plan and performance of seven equivalent SQL queries with di erent database management systems. For the experiment he proposed a schema and a synthetic instance on which the seven queries are executed.
At the time, the di erent systems could or could not execute all the queries and the performances signi cantly di ered among and within individual systems while one would expect the DBMS optimizer to choose the same optimal execution plan for these queries.
In this project, we propose to replay Fabian Pascal’s experiment with PostgreSQL.
CS4221/CS5421
1. Create a PostgreSQL database called \cs4221_p1". The schema consists of a table employee and a table payroll. The table employee records information about employees of a ctitious company. Employees have an employee identi er, a rst name and a last name, an address recorded as a street address, a city, a state and a zip code. The table payroll records, for each employee, her bonus and salary. The following SQL creates the tables employee and payroll with the domains in Fabian Pascal’s original article.
CREATE TABLE employee CREATE TABLE payroll
( (
empid CHAR(9), empid CHAR(9),
lname CHAR(15), bonus INTEGER,
fname CHAR(12), salary INTEGER
address CHAR(20), );
city CHAR(20),
state CHAR(2),
ZIP CHAR(5)
);
PL/pgSQL is procedural code that can be executed by the PostgreSQL server directly. You may use or modify the following PL/pgSQL function to generate random string of upper case alphabetical characters of a xed length.
CREATE or REPLACE FUNCTION random string(length INTEGER) RETURNS TEXT AS
$$
DECLARE
chars TEXT[] := 'fA,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Zg'; result TEXT := '';
i INTEGER := 0;
BEGIN
IF length 0 then
RAISE EXCEPTION 'Given length cannot be less than 0';
END IF;
FOR i IN 1..length
LOOP
result := result jj chars[1+random()∗(array length(chars, 1) 1)];
END LOOP;
RETURN result;
END;
$$ LANGUAGE plpgsql;
You may use or modify the following SQL DML code to insert data into the two tables.
INSERT INTO employee
SELECT
TO CHAR(g, '09999') AS empid,
random string(15) AS lname,
random string(12) AS fname,
'500 ORACLE PARKWAY' AS address,
'REDWOOD SHORES' AS city,
'CA' AS state,
'94065' AS zip
FROM
generate series(0, 9999) g;
INSERT INTO payroll(empid, bonus, salary)
SELECT
per.empid,
0 as bonus,
99170 + ROUND(random() ∗ 1000)∗100 AS salary
FROM
employee per;
Create and populate the employee and payroll tables with the code given above in the \cs4221_p1" database. Run the SQL query and PL/pgSQL function above with the Query Tool in pgAdmin 4.
Page 2
CS4221/CS5421
2. We use a Django web application to replay Fabian Pascal’s experiment. Download the zip le
\project1.zip" from Luminus Files \Project 1: Fabian Pascal" folder. It contains the FabianPascal folder and answer.sql.
Run the answer.sql le in the pgAdmin 4.
We want to nd the identi er and the last name of the employees earning a salary of $189170. All of the queries in the answer.sql is a simple query with INNER JOIN. You will need to change it later to answer the next questions.
Follow the following instructions to set up your Django web application.
◦ Go to your Bitnami-Django stack installation folder. This is the folder that you speci ed during the installation process.
◦ Go to apps django django_projects Project. For Windows, you need to nd the lo-cation of your django_projects (by default it should be in C:\Users\Username\). You should see two folders named conf and Project and a le named manage.py. Copy the FabianPascal folder (extracted from the \project1.zip") and paste it here.
◦ Go to the Project folder and locate settings.py le as shown in Figure below.
• Edit settings.py in jEdit. Change line 28 in the le settings.py from (the actual value inside the array might be di erent):
ALLOWED_HOSTS = ['127.0.1.1']
to:
ALLOWED_HOSTS = ['127.0.1.1','localhost','127.0.0.1']
• Edit settings.py in jEdit. Add the FabianPascal to the INSTALLED APPS. Change line 33 in that le from:
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles',
]
to:
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'FabianPascal'
]
Page 3
CS4221/CS5421
• Go to your project folder and open the Project/settings.py and change the database name by changing the following line:
From:
DATABASES = { 'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'djangostack',
'HOST': '/home/cs4221/djangostack-2.2.3-4/postgresql',
'PORT': '5432',
'USER': 'bitnami',
'PASSWORD': '9dfd01e5a1'
}
}
To:
DATABASES = { 'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'cs4221_p1',
'HOST': 'localhost',
'PORT': '5432',
'USER': 'postgres', ## fill with your username
'PASSWORD': 'postgres' ## fill with your password
}
}
Change the username and password accordingly to the password you set during the installation.
• Save and close the settings.py le.
• We need to add the FabianPascal URL to the project. Edit urls.py le in jEdit. Change the following lines from:
urlpatterns = [
re_path(r'^admin/', admin.site.urls), re_path(r'^$', default_urlconf),
]
to:
urlpatterns = [
re_path(r'^admin/', admin.site.urls), re_path(r'^$', default_urlconf), url(r'^FabianPascal/', include('FabianPascal.urls'))
]
• Save and close the urls.py le.
• Start the use_djangostack terminal. The script (use_djangostack.bat or use_djangostack.bash) is in your Bitnami-Django stack installation folder.
For Windows, click the use_djangostack.bat le.
For Mac, open Terminal from Launchpad, and execute cd /Applications/djangostack-2.2.4-0 to go to the bitnami-django stack installation folder (assuming you installed the stack in /Applications/djangostack-2.2.4-0). If you installed bitnami-django stack in a di erent folder change the line accordningly. On the terminal, execute ./use_djangostack.
For Linux, open a new terminal, go to the installation folder and type ./use_djangostack
• Go to the Django Project directory on the use_djangostack terminal with the command: cd apps/django/django_projects/Project/. For Windows, you need to nd the location of your django_projects (by default it should be in C:\Users\Username\) and change all \/" to \\".
Page 4
CS4221/CS5421
• Go back to the Project folder that contains the \manage.py" le and run the Django server with the following command:
python3 manage.py runserver
• Open http://127.0.0.1:8000/FabianPascal/ in your browser. Click on each question to view your answer and results.
• (Optional) If your Apache server is running, you cannot open the Project1 directly. You need to restart the server in the Bitnami Django app by clicking restart on the Apache webserver. Open your browser and open the following link http://localhost:8080/Project/FabianPascal/
In the Django web application, we have created a PL/pgSQL function called test that takes an SQL query Q and a number N as its parameters and returns the average execution time, as reported by EXPLAIN ANALYZE Q over N executions of the query Q.To see the function de nition, go to \Print PL/pgSQL function definition" in the menu or look at the source code.The function is called with N = 100.
Page 5
CS4221/CS5421
3. (5 points) We want to nd the identi er and the last name of the employees earning a salary of $189170.
One possible SQL query to answer this question is query_0 as follows.
SELECT per.empid, per.lname FROM employee per, payroll pay
WHERE per.empid = pay.empid AND pay.salary = 189170;
Write ve di erent but equivalent SQL queries that answer the same question and meet the require-ments indicated below, respectively. Unless otherwise indicated, do not use INNER JOIN, OUTER JOIN and NATURAL JOIN. Only use CROSS JOIN represented with comma in the FROM clause. Unless otherwise indicated, do not use subqueries in the FROM clause. Other unnecessary constructs are also penalized.
1. a simple query with OUTER JOIN and only IS NULL or IS NOT NULL conditions in the WHERE clause,
2. a nested query with a correlated subquery in the WHERE clause,
3. a nested query with an uncorrelated subquery in the WHERE clause,
4. a nested query with an uncorrelated subquery in the FROM clause,
5. a double-negative nested query with a correlated subquery in the WHERE clause.
Write your answers in the SQL le \answer.sql" by modifying the code of the ve corresponding SQL queries, namely query_1 to query_5, respectively.
Do not remove any of the view de nitions CREATE OR REPLACE VIEW query_ n (empid, lname) AS
Queries that do not produce the correct results will automatically receive zero mark. Beware of minor typos and errors.
Do not remove the code for query_0 and use it to check your queries. All queries should have the same results.
Indicate the measured time for 100 executions of each of the queries (replace time by the average execution time reported by the Web page) in the indicated space in the le \answer.sql".
4. (5 points) Propose a new query (di erent from the above ve queries) that is non-trivially1 as slow as possible. Do not modify the schema and the data. Measure and indicate its average execution time over 100 executions. Give your answer in the corresponding question in the SQL le \answer.sql". This question is marked competitively rst on the speed and then the originality of the answer.
Indicate the measured time for 100 executions of query_slowest (replace time by the average execution time reported by the Web page) in the indicated space in the le \answer.sql". Make sure that you are able to run the 100 executions on your computer and report the average execution time. You may be called to demosntrate the execution o fthe query on your machine.
Beyond this project, we recommend that you experiment with di erent schemas, constraints and indexes and create the environment for the fastest query.