$24
Introduction
Lab 5 is an opportunity for you to automate a common database administrator activity using Python. You are provided a small file containing student registration records. Each record contains information about the student and the class in which they are registered. You are to take this information and create two tables in a database using the Python scripting language.
The input file is in Comma Separated Values format (CSV) and the output is a SQLite3 database. Python provides support for both file types using plug-in modules.
Requirements
Your Python script shall be named dbload and be marked executable.
Usage: ./dbload INPUTCSV OUTPUTDB
Your Python script shall read CSV file INPUTCV and create a SQLite3 database OUTPUTDB . In that database you will create two tables classes and students with the following attributes (please use these exact names and
data types):
classes
id (text)
subjcode (text)
coursenumber (text)
termcode (text)
TEXT
students
id (text, primary key unique)
lastname (text)
firstname (text)
major (text)
email (text)
city (text)
state (text)
zip (text)
TEXT
A sample input CSV file is found at /var/classes/cs3030/lab5/studentregs.csv . Inspect it carefully to learn where the fields are located. The following is true about all input files you will need to process:
The field separator character is a comma. Double quote marks are used to enclose strings with imbedded spaces.
There is one record for each class registration and there may be more than one registration per student.
The student information is duplicated for each class registration by that student.
The data is complete and in its proper format so (hopefully) no error checking should be required while reading the data.
The first record of the CSV file is a typical CSV header record; skip over it.
There should be no blank records in the CSV file.
Two fields in the CSV file require explanation:
Use the wnumber field in the CSV file as the ID field in both tables
The course field in the CSV file is composed of the subject code and the course number. For example, course "CS 3030" is composed of the subject code CS and the course number 3030. Break it up using split(" ") as in this example (adapt this snippet to your code):
s = "CS 3030".split(" ")
# s[0] is "CS", s[1] is "3030"
Hints
Suggested logic (and this is not the only way it could be done):
Issue an appropriate usage message and exit(1) if the user did not specify both the input CSV file and the output database file.
Open the CSV file using try/except and print the exception and `exit(1)`if an error occurs.
Open the database using try/except and print the exception and exit(1) if an error occurs
Drop the two tables if they exist and create the two tables.
For each record in the CSV file:
skip the header record (the first record in the CSV file)
attempt to retrieve the student from the students table using the wnumber
if the retrieval fails, insert the student record into the students table insert the class record into the classes table
exit(0) after all records have been added to the database
The students table uses the ID field as a primary key; attempts to add duplicate students will result in an exception in SQLite3.
Add comments to document your logic.
Don’t forget to import modules sqlite3 , csv and sys or you will have errors galore.
Use /usr/bin/python for this lab.
CS 3030 Cowan 01-07-2019 07:26 PM 2
Clone your private repo on github.com
In the assignment module in Canvas, find the link to the Canvas page entitled "Clone your Lab 5 Repo", click on it and follow the instructions.
Copy your private repo down to icarus
BASH
git clone https://github.com/cowancs3030spring19/lab5-YOURGITHUBUSERNAME
cd lab5-YOURGITHUBUSERNAME
Write and test dbload
Fire up your favorite text editor, and update the header:
#!/usr/bin/python
TEXT
(Your name)
Lab 5 - Database Loader
CS 3030 - Scripting Languages
(add your celestial code here)
Run cucumber to check your progress
./cucumber -s
cucumber randomly generates testfiles so you will want to run cucumber many, many times to verify your script’s operation.
Submit your assignment code for grading
Remember, you must push your script in your private repo to github.com to receive any points, for all assignments in this course.
BASH
git add dbload
git commit -m"COMMIT MESSAGE"
git push origin master
Files created for this lab
dbload
Grading
Here is how you earn points for this assignment:
CS 3030 Cowan 01-07-2019 07:26 PM 3
FEATURES
POINTS
Must-Have Features
Script is named correctly and found in its proper place in your private repo
5
Script is executable
5
Required Features
Script prints a “Usage:” statement and exits rc=1 if either the INPUTCSV or
10
OUTPUTDB files are not specified on the commandline
Script prints an error message containing the word “Error” and exits rc=1 if the
15
INPUTCSV file cannot be opened
Script prints an error message containing the word “Error” and exits rc=1 if the
15
OUTPUTDB file cannot be opened
Script exits rc=0 on successful completion
10
Script correctly defines the students table in the database
30
Script correctly defines the classes table in the database
30
Script adds the right number of students to the students table
40
Script adds the right number of classes to the classes table
40
Script add the correct student data to the students table
50
Script adds the correct classes data to the classes table
50
Grand Total
300
CS 3030 Cowan 01-07-2019 07:26 PM
4