$23.99
A relational database management system (RDBMS) maintains data sets called relations. A relation has a name, a schema, and a set of tuples. A schema is a set of attributes. A tuple is a set of attribute/value pairs. An attribute is the name associated with each data value in a tuple entry. Relations are used as operands in relational operations such as rename, selection, projection, union, and join. The goal of this project is to build a relational database system from a Datalog file, and then answer queries using relational algebra.
A Datalog Program can be represented by a Database. Each scheme in a Datalog Program defines a relation in the Database. The scheme defines the name of the relation, and the attribute list of the scheme defines the schema of the relation. Each fact in the Datalog Program defines a tuple in a relation. The fact name identifies a relation to which the tuple belongs. The basic data structure is a database consisting of relations, each with their own name, schema, and set of tuples. How you choose and use your data structures effects the difficulty of the project. Inheritance and polymorphism are always useful. Each data structure implementation has its benefits, drawbacks and obstacles to overcome.
Your program must convert every Datalog query into a sequence of select, project, and rename operations. As you prepare to write your program, take out a piece of paper, convert each of the queries in the given examples into a relational expression, and carry out these expressions. Your program should create these same expressions.
Project Description
Write an RDBMS‑based interpreter for a Datalog Program. Given a Datalog Program, your interpreter should print the result for each query using only the facts (interpretation of the rules will be added in Project 4).
To format the output for pass‑off, you must make it look like the examples below. Note the following: The queries are to be evaluated in the order listed in the Datalog Program. For each query, print the query and a space. Then, if the query's resulting relation is empty, output “No”; and if the resulting relation is not empty, output “Yes(n)” where n is the number of tuples in the resulting relation. If there are free variables in the query, print the tuples of the resulting relation, one per line and indented by two spaces, according to the following directions. Each tuple should be a comma‑space separated list of pairs–(variable, string‑value)–ordered according to the order of variable appearance in the query. The format of a pair is v= 's', where v is the variable and 's' is the string value. Sort the tuples alphabetically based on string values in the tuples' variables (sort with the first sort key as the first variable (in order of appearance in the query), the second sort key as the second variable, and so on).
The focus of this project is on interpreting 236‑Datalog (without rules)–not on building an industrial‑strength interpreter. Thus, to retain the central focus and keep the amount of coding within a reasonable bound, you may assume:
The arity (number of parameters) of all schemes, facts, rules and queries that have the same name are guaranteed to match. You do not have to check for matching arity.
No two attributes in the same scheme will have the same name. You do not have to check for this.
No two schemes in the scheme list will have the same name. You do not have to check for this.
Every scheme (or relation) referenced in a fact, rule head, predicate, or query will have been defined in the scheme section. Attribute names in schemes and variable names in queries have distinct name spaces: no attribute name will appear as a variable in a query.
http://wiki.cs.byu.edu/cs-236/relational-database 1/5
3/23/2015 cs-236:relational-database[CS W iki]
Requirements (Get the Design Right)
Your design should include classes for the database and relation. Queries must be implemented by performing the appropriate relational operations (select, project, and rename). The implementation of the relational operators must be methods, and they must follow the mathematical definitions described in class and in the textbook. For example, the operators must not produce relations which have duplicate tuples or duplicate names in the schema.
Your project will not be accepted if it does not implement a relation object with the relational algebra operators as methods that are used to answer queries.
Examples
These are not sufficient to completely test your program. Your program must have output formatted exactly like the example outputs below. Example Input
Input
S
c
h
e
m
e
s
:
)
F
a
S
K
(
A
,
B
c
t
s
:
S K ( ' a ' , ' c ' ) .
S K ( ' b ' , ' c ' ) .
S K ( ' b ' , ' b ' ) .
R
u
S K ( ' b ' , ' c ' ) .
l
e
s
:
: -
S t u f f ( Z ) .
Q
u
D o N o t h i n g ( Z )
e
r
i
e
s
:
c
' ) ?
S
K
(
A
,
'
S K ( ' b ' , ' c ' ) ?
S
K
(
X
,
X
)
?
S
K
(
A
,
B
)
?
Output
S K ( A , ' c ' ) ?
Y e s ( 2 )
A
=
'
a
'
A
=
'
b
'
Y e s ( 1 )
S K ( ' b ' , ' c ' ) ?
S K ( X , X ) ?
Y e s ( 1 )
X
=
'
b
'
Y e s ( 3
)
S K ( A , B ) ?
A = ' a ' ,
B = ' c '
A = ' b ' ,
B = ' b '
A = ' b ' ,
B = ' c '
Input
S c h e m e s :
http://wiki.cs.byu.edu/cs-236/relational-database
F
a
c
t
s
a
b
(
A
,
B
)
a b ( ' j o e ' , ' b o b ' ) .
a b ( ' j i m ' , ' b o b ' ) .
a b ( ' j o e ' , ' j i m ' ) .
R
u
l
e
s
a b ( ' b o b ' , ' b o b ' ) .
:
Q
u
e
r
i
e s :
a b ( ' j o e ' , ' j i m ' ) ?
a b (
w h o ,
' b o b ' ) ?
a b ( ' j o e ' ,
a n y o n e ) ?
a
b
(
X
,
X
)
?
a
b
(
X
,
Y
)
?
Output
a
b
( ' j o e ' , ' j i m ' ) ?
Y
e
s
(
1 )
a
b
( w h o , ' b o b ' ) ?
Y
e s
(
3
)
w
h
o
=
'
b
o
b
'
w
h
o
=
'
j
i
m
'
w
h
o
=
'
j
o
e
'
a b ( ' j o e ' , a n y o n e ) ?
Y e s ( 2 )
a n y o n e = ' b o b '
a n y o n e = ' j i m '
a b ( X , X ) ?
Y e s ( 1 )
X = ' b o b
'
)
a b ( X , Y ) ?
Y e s ( 4
X = ' b o b ' ,
Y = ' b o b '
X = ' j i m ' ,
Y = ' b o b '
X
=
'
j
o
e
'
,
Y
=
'
b
o
b
'
X
=
'
j
o
e
'
,
Y
=
'
j
i
m
'
Input
S
c
h
e
m
e
s
:
c
(
D
,
C
)
F
a
c
t
s
:
d
d c ( ' r a l p h ' , ' h o w a r d ' ) .
R
u
l
e
s
:
Q
u
e
r
i
e
s
:
X ) ?
d c ( ' r a l p h ' ,
d c ( ' b o b ' , ' b o b ' ) ?
d
c
(
X
,
Y
) ?
Output
d c ( ' r a l p h ' , X ) ?
Y e
s ( 1 )
d
c
X
=
'
h
o
w
a
r
d
'
N o
( ' b o b ' , ' b o b ' ) ?
d
c
( X , Y ) ?
Y e s ( 1
)
'
h
o w a r d '
X
=
'
r
a
l p
h
'
,
Y =
In the next example, notice that the rules are not evaluated, but the queries are still answered using only the facts included in the file.
Input
S
c
h
e
m
e
s
:
p e o p l e ( p e r s o n 1 , p e r s o n 2 )
F
a
c
t
s
:
e m p l o y e r ( b o s s , e m p l o y e e )
p e o p l e ( ' j o e ' , ' b o b ' ) .
p e o p l e ( ' j i m ' , ' b o b ' ) .
p e o p l e ( ' j o e ' , ' j i m ' ) .
e m p l o y e r ( ' r a l p h ' , ' h o w a r d ' ) .
R
u
l
e
s
:
p e o p l e ( ' b o b ' , ' b o b ' ) .
e m p l o y e r ( X , Y ) : -
p e o p l e ( Y , X ) .
e m p l o y e r ( X , Y ) : -
p e o p l e ( X , Z ) , e m p l o y e r ( Z , Y ) .
Q
u
e
r
i
e
s
p e o p l e ( X , Y ) : -
p e o p l e ( Y , X ) .
:
p e o p l e ( ' j o e ' , ' j i m ' ) ?
p e o p l e (
w h o ,
' b o b ' ) ?
p e o p l e ( ' j o e ' ,
a n y o n e ) ?
p e o p l e ( X , X ) ?
p e o p l e ( X , Y ) ?
X ) ?
e m p l o y e r ( ' r a l p h ' ,
e m p l o y e r ( ' b o b ' , ' b o b ' ) ?
e m p l o y e r ( X , Y ) ?
Output
p
e
o p l e ( ' j o e ' , ' j i m ' ) ?
Y
e
s
(
1 )
p
e
o p l e ( w h o , ' b o b ' ) ?
Y
e s
(
3
)
w
h
o
=
'
b
o
b
'
w
h
o
=
'
j
i
m
'
w
h
o
=
'
j
o
e
'
p e o p l e ( ' j o e ' , a n y o n e ) ?
Y e s ( 2 )
a n y o n e = ' b o b '
a n y o n e = ' j i m '
p e o p l e ( X , X ) ?
Y e s ( 1 )
X
=
'
b
o
b
'
Y e s ( 4 )
p e o p l e ( X , Y ) ?
X = ' b o b ' ,
Y = ' b o b '
X = ' j i m ' ,
Y = ' b o b '
X = ' j o e ' ,
Y = ' b o b '
X = ' j o e ' ,
Y = ' j i m '
Y e s ( 1 )
e m p l o y e r ( ' r a l p h ' , X ) ?
X
=
'
h
o
w
a r
d '
N o
e m p l o y e r ( ' b o b ' , ' b o b ' ) ?
e m p l o y e r ( X , Y ) ?
Y e s ( 1 )
X = ' r a l p h ' ,
Y = ' h o w a r d '
FAQ
When should I select, project, or rename in a query?
Here is a simple strategy to consider when deciding how to evaluate queries.
1. Only do select operations as you traverse the parameters. The select will either be on a constant or it will be on an ID if the ID has been seen previously (i.e., the special case to avoid having two identical IDs in the scheme.
2. Once all the select operations have been completed, then rename and project.
The strategy implies that each select knows the list of IDs in the final project since it needs to check for duplicates to determine when to select on a column rather than a constant. Also in this strategy, it may be the case that no select takes place on an ID parameter (which is just fine).
Should project also re‑order the tuple?
The output for the lab requires that tuples follow the order declared in the query. Although not apparent right now, consider what happens in a natural join? In such an operation, the order becomes less clear. A project‑method can ensure the tuples follow the correct order as part of its operation with some little thought and foresight.
Submission
Review the project standards for creating a zip archive.
Navigate to Learning Suite [http://learningsuite.byu.edu], select 'CS 236', and click on 'Assignments' on the course home page. Click on the link to the relevant project and at the bottom of the description click on 'View/Submit'. Use the resulting upload dialog to upload your zip archive.
Pass‑off
Pass‑off your project directly to a TA during normal TA hours. TAs help students on a first‑come/first‑serve basis and are under no obligation to stay later than designated hours so plan accordingly. Please review the syllabus for the pass‑off requirements.