SQL Assignment Solution

Starting from:

~~$29.99~~

$23.99

In this assignment you will use SQLite3 to execute a few SQL queries. You can find documentation on SQLite3 at here. SQLite3 is not as functional as PostgreSQL or MySQL or the commercial relational DBMSes, but it is much easier to use and query. That is why there are rumoured to be many more installations of SQLite than PostgreSQL or any other traditional DBMS. Also, you can easily install SQLite3 on your own machine, but make sure your final SQL queries run on SQLite3 installed on the CS mumble machines.

For this assignment, we will use the US Census Population Estimate dataset, which can be found here. We have downloaded and cleaned this dataset and inserted it into a sqlite database called census.db that can be found in the directory called P op Estimate Dataset in this assignment directory.

There is a layout file for each table in the database. The table layout is helpful in understanding what each attribute means. Section 3 of this document gives you details of the schema of the database.

2   Query the Database

Your task is to write the following queries in separate files, as described below:

File
Query
Output Columns

Query1.sql
List all states and their values for 2011 housing estimate

(HUEST 2011)
(State Code, Housing

Estimate)

Query2.sql
Find the states which have the number of births in 2012

(BIRTS2012) to be greater than 80,000

(State Code, Births)

Query3.sql
For each state, print its name and the net migration into

the state for the year 2012 (NETMIG2012)

(State Name, Net Migration)

Query4.sql
Find the number of counties in each state and print it with

the name of the state
(State Name, Count of

Counties)

Query5.sql
Display names of all “metropolitan statistical

areas”(LSAD) and their corresponding migration for the period between June 2010 and June 2011 (NETMIG2011)

(Area Name, Net Migration)

Query6.sql
Find each state, print the population estimate in the year

2011 (POPESTIMATE2011) of all women aged between

21 and 45 (both inclusive)

(Age, Population)

Query7.sql
For each division, print the name of the division 1∼9

(DIVISION DESC) and a list of all states that belong to that region concatenated into one string that is separated by commas (take a look at the sqlite documentation for the appropriate aggregate function)

(Division Name, State List)

Query8.sql
For each division, find the name of the state that has the

greatest increase in housing estimates for the year 2011 (HUEST 2011) as compared to housing estimates for the year 2010 (HUEST 2010)

(Division Name, State Name, Increase)

Query9.sql
For each states, display the percentage of the states

estimated 2011 population (POPESTIMATE2011) that is above the age of 21 (≥ 21)

(State Name, Percentage)

Query10.sql
Find the states for which the ratio of the “2011 resident

total population estimates”(POPESTIMATE2011) and “2011 housing estimates base”(HUEST 2011) is below the nationwide ratio. The ratios should be compared in floating point number, not in integer

(State Name)

Query11.sql
For each state, find the name of the county that has the

smallest increase in the population between 2011 and 2012 (NPOPCHG 2012). Look at counties individually and not the state as a whole. Display name of state and county and population change in lexicographic order of

“Division Desc”. For states with same division, sort in decreasing order of the 2012 population estimate (POPESTIMATE2012) of the county

(State Name, County, Population Change)

Query12.sql
For all age groups, calculate the absolute difference of 2011

population estimate (POPESTIMATE2011) and 2010 population estimate (POPESTIMATE2010). Also, display the indicator that indicates if the population has increased or decreased, or remained the same. The value of indicator should be one of {increased, same, decreased}

(Age, Difference, Indicator)
—

The sqlite file itself is in binary and does not convey any information about the tables present in it. You may want to use sqlitestudio. This allows you to visualize the tables and run queries on the data which might be useful in the above task.

If you cant write any one of the queries above, provide an explanation of the features that are lacking in SQLite3 that prevents you from being able to write that query. Write these explanations in a file called Queries-readme.txt. To test whether or not your queries work as expected, you can use sqlitestudio or the command below.

Note: To check the query output, we will run the following command on our copy of census.db :

$ sqlite3 census.db < Query3.sql

Thus, your text file can have multiple queries, but it must only output your final desired output, and clean up any temporary tables that it creates. You should try to minimize the number of distinct SQL query blocks in each query file – try to write a single query for each of the queries above, whenever possible.

Copy each of your query’s result into a separate file called result[1-12].txt, i.e., the word result followed by the query number and .txt. For example, if we want to get the results of Query 5, we would execute the following:

$ sqlite3 census.db < Query5.sql result5.txt

The above command can be repeated for each query with the proper query number substituted in place to get all

12 result files.

3   Schema for the Population Database

Here is a list of tables that are present in the database and the corresponding files in which you can find functional details such as the semantics of each column of the table.

Table Name
Where to Find the Functional Details About the Table
POP_ESTIMATE_NATION_STATE_PR
POP_ESTIMATE_NATION_STATE_PR.pdf
POP_ESTIMATE_STATE_COUNTY
POP_ESTIMATE_STATE_COUNTY.pdf
POP_ESTIMATE_CITIES_TOWNS
POP_ESTIMATE_CITIES_TOWNS.pdf
POP_ESTIMATE_STATE_AGE_SEX_RACE_ORIGIN
POP_ESTIMATE_STATE_AGE_SEX_RACE_ORIGIN.pdf
POP_ESTIMATE_PR_SEX_AGE
POP_ESTIMATE_PR_SEX_AGE.pdf
POP_ESTIMATE_PR_MUNICIPIOS
POP_ESTIMATE_PR_MUNICIPIOS.pdf
POP_ESTIMATE_PR_MUN_SEX_AGE
POP_ESTIMATE_PR_MUN_SEX_AGE.pdf
POP_ESTIMATE_METRO_MICRO
POP_ESTIMATE_METRO_MICRO.htm
HU_UNIT_STATE_LEVEL
HU_UNIT_STATE_LEVEL.pdf

The above mentioned pdf and htm files are present along with the data files in the P op Estimate Dataset directory.

We have created six more tables listed below. The above population tables have few columns such as Division, Region, Origin, Race, Sex, Sumlev that have numeric values in it. The meaning for those numeric values can be obtained from these separate tables below. The schemas for these tables are provided below along with the schemas for other tables.

Table Name
REGION
DIVISION
RACE
SEX
ORIGIN
SUMLEV

The file layout for each Population table does not give any information about the Data Type and any Constraints present on the table. It can be referred to know the functional details as to what information is present in each table and what each column in the table means.

We have listed the data type and constraints (Primary and Foreign Key) for each table below. The attribute with value F K in the C ons. field is a foreign key to the primary key of the table with the same name. As you can see below, there are cases where the same attribute is both a primary and foreign key. This might be useful when it comes to writing the actual queries.

Column
Type
Cons.
SEX_CD
Number
PK
SEX_DESC
String

REGION                                       SEX

Column
Type
Cons.
REGION_CD
Number
PK
REGION_DESC
String

Column
Type
Cons.
RACE_CD
Number
PK
RACE_DESC
String

DIVISION                                     RACE

Column
Type
Cons.
DIVISION_CD
Number
PK
DIVISION_DESC
String

Column
Type
Cons.
SUMLEV_CD
Number
PK
SUMLEV_DESC
String

ORIGIN                                       SUMLEV

Column
Type
Cons.
ORIGIN_CD
Number
PK
ORIGIN_DESC
String

Column
Type
Cons.
SUMLEV
Number
FK
REGION
Number
FK
DIVISION
Number
FK
STATE
Number

NAME
String
PK
CENSUS2010POP
Number

ESTIMATESBASE2010
Number

POPESTIMATE2010
Number

POPESTIMATE2011
Number

POPESTIMATE2012
Number

NPOPCHG_2010
Number

NPOPCHG_2011
Number

NPOPCHG_2012
Number

BIRTHS2010
Number

BIRTHS2011
Number

BIRTHS2012
Number

DEATHS2010
Number

DEATHS2011
Number

DEATHS2012
Number

NATURALINC2010
Number

NATURALINC2011
Number

NATURALINC2012
Number

INTERNATIONALMIG2010
Number

INTERNATIONALMIG2011
Number

INTERNATIONALMIG2012
Number

DOMESTICMIG2010
Number

DOMESTICMIG2011
Number

DOMESTICMIG2012
Number

NETMIG2010
Number

NETMIG2011
Number

NETMIG2012
Number

RESIDUAL2010
Number

RESIDUAL2011
Number

RESIDUAL2012
Number

RBIRTH2011
Real

RBIRTH2012
Real

RDEATH2011
Real

RDEATH2012
Real

RNATURALINC2011
Real

RNATURALINC2012
Real

RINTERNATIONALMIG2011
Real

RINTERNATIONALMIG2012
Real

RDOMESTICMIG2011
Real

RDOMESTICMIG2012
Real

RNETMIG2011
Real

RNETMIG2012
Real

Column
Type
Cons.
SUMLEV
Number
FK
REGION
Number
FK
DIVISION
Number
FK
STATE
Number
PK
COUNTY
Number
PK
STNAME
String

CTYNAME
String

CENSUS2010POP
Number

ESTIMATESBASE2010
Number

POPESTIMATE2010
Number

POPESTIMATE2011
Number

POPESTIMATE2012
Number

NPOPCHG_2010
Number

NPOPCHG_2011
Number

NPOPCHG_2012
Number

BIRTHS2010
Number

BIRTHS2011
Number

BIRTHS2012
Number

DEATHS2010
Number

DEATHS2011
Number

DEATHS2012
Number

NATURALINC2010
Number

NATURALINC2011
Number

NATURALINC2012
Number

INTERNATIONALMIG2010
Number

INTERNATIONALMIG2011
Number

INTERNATIONALMIG2012
Number

DOMESTICMIG2010
Number

DOMESTICMIG2011
Number

DOMESTICMIG2012
Number

NETMIG2010
Number

NETMIG2011
Number

NETMIG2012
Number

RESIDUAL2010
Number

RESIDUAL2011
Number

RESIDUAL2012
Number

GQESTIMATESBASE2010
Number

GQESTIMATES2010
Number

GQESTIMATES2011
Number

GQESTIMATES2012
Number

RBIRTH2011
Real

RBIRTH2012
Real

RDEATH2011
Real

RDEATH2012
Real

RNATURALINC2011
Real

RNATURALINC2012
Real

RINTERNATIONALMIG2011
Real

RINTERNATIONALMIG2012
Real

RDOMESTICMIG2011
Real

RDOMESTICMIG2012
Real

RNETMIG2011
Real

RNETMIG2012
Real

4

Column
Type
Cons.
CBSA
Number

MDIV
Number

STCOU
Number

NAME
String
PK
LSAD
String
PK
CENSUS2010POP
Number

ESTIMATESBASE2010
Number

POPESTIMATE2010
Number

POPESTIMATE2011
Number

NPOPCHG_2010
Number

NPOPCHG_2011
Number

NATURALINC2010
Number

NATURALINC2011
Number

BIRTHS2010
Number

BIRTHS2011
Number

DEATHS2010
Number

DEATHS2011
Number

NETMIG2010
Number

NETMIG2011
Number

INTERNATIONALMIG2010
Number

INTERNATIONALMIG2011
Number

DOMESTICMIG2010
Number

DOMESTICMIG2011
Number

RESIDUAL2010
Number

RESIDUAL2011
Number

Column
Type
Cons.
SUMLEV
Number
FK
MUNICIPIO
Number
PK
NAME
String

YEAR
Number
PK
POPESTIMATE
Number

POPEST_MALE
Number

POPEST_FEM
Number

UNDER5_TOT
Number

UNDER5_MALE
Number

UNDER5_FEM
Number

AGE513_TOT
Number

AGE513_MALE
Number

AGE513_FEM
Number

AGE1417_TOT
Number

AGE1417_MALE
Number

AGE1417_FEM
Number

AGE1824_TOT
Number

AGE1824_MALE
Number

AGE1824_FEM
Number

AGE16PLUS_TOT
Number

AGE16PLUS_MALE
Number

AGE16PLUS_FEM
Number

AGE18PLUS_TOT
Number

AGE18PLUS_MALE
Number

AGE18PLUS_FEM
Number

AGE1544_TOT
Number

AGE1544_MALE
Number

AGE1544_FEM
Number

AGE2544_TOT
Number

AGE2544_MALE
Number

AGE2544_FEM
Number

AGE4564_TOT
Number

AGE4564_MALE
Number

AGE4564_FEM
Number

AGE65PLUS_TOT
Number

AGE65PLUS_MALE
Number

AGE65PLUS_FEM
Number

AGE85PLUS_TOT
Number

AGE85PLUS_MALE
Number

AGE85PLUS_FEM
Number

MEDIAN_AGE_TOT
Number

MEDIAN_AGE_MALE
Number

MEDIAN_AGE_FEM
Number

HU_UNIT_STATE_LEVEL

Column
Type
Cons.
SUMLEV
Number
FK
STATE
Number
FK
REGION
Number
PK
DIVISION
Number
FK
STNAME
String

HUCENSUS2010
Number

HUESTBASE2010
Number

HUEST_2010
Number

HUEST_2011
Number

Column
Type
Cons.
SUMLEV
Number
FK
STATE
Number
PK
COUNTY
Number
PK
PLACE
Number
PK
COUSUB
Number
PK
CONCIT
Number
PK
NAME
String

STNAME
String

CENSUS2010POP
Number

ESTIMATESBASE2010
Number

POPESTIMATE2010
Number

POPESTIMATE2011
Number

Column
Type
Cons.
SUMLEV
Number
FK
MUNICIPIO
Number
PK
NAME
String

ESTIMATESBASE2010
Number

POPESTIMATE2010
Number

POPESTIMATE2011
Number

NPOPCHG_2010
Real

NPOPCHG_2011
Real

PPOPCHG_2010
Real

PPOPCHG_2011
Real

SRANK_ESTBASE2010
Number

SRANK_POPEST2010
Number

SRANK_POPEST2011
Number

SRANK_NPCHG2010
Number

SRANK_NPCHG2011
Number

SRANK_PPCHG2010
Number

SRANK_PPCHG2011
Number

Column
Type
Cons.
SUMLEV
Number
FK
STATE
Number

NAME
String

SEX
Number
PK, FK
AGE
Number
PK
CENSUS2010POP
Number

ESTIMATESBASE2010
Number

POPESTIMATE2010
Number

POPESTIMATE2011
Number

POP_ESTIMATE_STATE_AGE_SEX_RACE_ORIGIN       POP_ESTIMATE_PR_SEX_AGE

Column
Type
Cons.
SUMLEV
Number
FK
REGION
Number
FK
DIVISION
Number
FK
STATE
Number
PK
SEX
Number
PK, FK
ORIGIN
Number
PK, FK
RACE
Number
PK, FK
AGE
Number
PK
CENSUS2010POP
Number

ESTIMATESBASE2010
Number

POPESTIMATE2010
Number

POPESTIMATE2011
Number

4   Submission

Create only the following files for this assignment: Query[1-12].sql, result[1-12].txt, Queries-readme.txt. Compress these files into a tar.gz package. The man pages for tar should be able to helf you with this. The name of your compressed package should be <your netid A2.tar.gz. For example, if your UW-netid is johndoe@wisc.edu, your compressed package should be called johndoe A2.tar.gz. Please use comments in all your SQL files to better explain what you are trying to accomplish in each query. Please upload this tar file to the Assignment 2

Submission Dropbox folder in learn@uw.

More products

$6.00 OFF

Homework 5 - Arrays and Masking Solution

$35

$29

Buy now

$6.00 OFF

Homework 4- Logicals Solution

$35

$29

Buy now

$6.00 OFF

Homework 3 - Vectors and Strings Solution

$35

$29

Buy now