Starting from:
$30

$24

CENG A Software Rasterizer implementing the Forward Rendering Pipeline Solution

    • Introduction

In this project you are supposed to write many SQL queries on a relational database which will be created by using a dataset. The dataset is taken from Yelp Open Dataset which is available for access to users for personal and non-commercial use. You can reach it from here. Also, you can nd the documentation on the same link if you are interested in the dataset. Yelp Dataset contains too much information for this task, so we have done some cleaning to reduce the size of the dataset and converted json les to csv les. You can reach the data which will be used on this project from here. This project has three parts:


    • Create the database using the given ’csv’  les.

    • Write proper SQL queries for certain problems.

    • Create triggers and views.

Note that all tasks should be completed using PostgreSQL.


    • Database Schema

Business(business id, business name, address, state, is open, stars)


Users(user id, user name, review count, yelping since, useful, funny, cool, fans, average stars)

Friend(user id1, user id2)

Review(review id, user id, business id, stars, date, useful, funny, cool)

Tip(tip id,business id, user id, date, tip text, compliment count)


Yelp is a system in which you can nd any kind of service such as restaurants, dentists or vet. You can see businesses in your area, their stars (measure for appreciation given by other users) and also you can vote on such businesses that you’ve visited. The main purpose of Yelp is create an environment for users to share information among themselves.








1
2.1    Foreign Key Constraints

Friend’s user id1 and user id2 references Users’ user id.

Review’s user id references Users’ user id.

Review’s business id references Business’s business id.

Tip’s business id references Business’s business id.

Tip’s user id references Users’ user id.



2.2    Explanation of Attributes

Business    • business id: Primary key of the business table that stores id’s of each business.


    • business name: Name of that business.

    • address: Address of the business.

    • state: State code that the business is located at.

    • is open: Boolean value, that indicates whether the business still working or closed completely.

    • stars: Float value, average stars given to that business by users.


Users    • user id: Primary key of the Users table that stores id’s of each user.


    • user name: Name of the user.

    • review count: Indicates how many reviews have been made by that user.

    • yelping since: The date user started using Yelp system.

    • useful: Count of this useful votes that sent by this user.

    • funny: Count of this funny votes that sent by this user.

    • cool: Count of this cool votes that sent by this user.

    • fans: Number of fans of that user.

    • average stars: Float value, average stars given by that user.


Friend    • user id1: First user’s id.


• user id2: Friend of the    rst user.


Review    • review id: Primary key of the Review table that stores id’s of each review.


    • user id: User that gave the review.

    • business id: Business that is reviewed by the user.

    • stars: Given stars to the business by the user.

    • date: Review date.

    • useful: Count of useful tags given by other users to this review.

    • funny: Count of funny tags given by other users to this review.

    • cool: Count of cool tags given by other users to this review.


Tip    • tip id: The id of the tip. It is NOT provided in the .csv le, it must be an auto incremented value.


• business id: Business that is given a tip.




2

    • user id: User that gave the tip about the business.

    • date: Date of the given Tip.

    • compliment count: Compliments that the tip is received.

    • tip text: String, text of the tip.


2.3    General Information About Relations

    • Reviews are supposed to be detailed reports created by a user for a business. However, since Review texts are too large in terms of bytes we’ve ignored them.

    • Tips are shorter reviews like ’this restaurant’s pizza is really good.’. They are similar to the reviews but less detailed.


    • Tasks

3.1    Task 1 - Creating the Database - 15 pts

Using the given ’csv’ les and considering the database schema above create a database us-ing PostgreSQL. For this task you should create a le named ’task1.sql’ that contains SQL statements that you’ve used to create the database.

3.2    Task 2 - Advanced SQL Queries - 70 pts

For this task you should be able to write SQL queries for given problems. Please order your queries (from query1 to query14) and create a le named ’task2.sql’. Each question is 5 pts.

    1. Find the users whose review count is higher than its fans and reviewed at least a business which has more than 3.5 stars. List user id, user name, di erence between review count and fans (Ordered by di erence between review count and fans, user id DESC) (1504630 rows)

    2. Find the users who have tipped a currently open business located in ’TX’ and get compli-ments more than 2. List all the user-business pairs (this means if there is two businesses that a user get more than 2 compliments list both tips for that user) with user name, busi-ness name, tip date, compliment count (Ordered by compliment count, tip date DESC) (38 rows)

    3. Find the top 20 users by their friend count (more friends are better ). List user names and count of friends (Ordered by friend count, user name DESC) (20 rows)

    4. Find the users who have given lower stars to a business than its current stars. List distinct users’ user name (not distinc user name distinct user), average stars and yelping since (Ordered by average stars, yelping since DESC) (1139971 rows)

    5. List open "good" businesses who received the highest number of "good" tips in 2020. "Good" business is the one which has word ’good’ in its tip text. List the business name, state and the stars (Ordered by stars, business name DESC). (4 rows)

    6. Find the users who have lower average stars than all of his/her friends’ average stars(Consider the average stars of Users table.). List user name, yelping since and average stars of such users.(Ordered by average stars, yelping since DESC) (105952 rows)




3

    7. The average stars of businesses in a state gives us Average of the State. Find top 10 state by highest average stars. List state code and average stars (Ordered by average stars DESC) (10 rows).

    8. A tip is a GOOD tip if its compliment count is higher than 0 (at least 1 compliment needed). For each year calculate the percentages of GOOD tips if the percentage is higher than 1 percent, this year is called a TIP YEAR. Find all the TIP YEARs. List date of the year and average compliment count (Ordered by year ASC) (5 rows).

    9. List the names of the user’s who only reviewed businesses who have stars more than 3.5. (Ordered by user name ASC) (832155 rows)

    10. Popular businesses are those with more than 1000 reviews. For each such business nd the business name and average stars for each year. List only those with average stars greater than 3 (in ascending order of years, business name) (5601 rows).

    11. Find the users who have got more useful votes than cool votes (Note that Users table contains votes sent by the users, now we need to nd votes s/he got by reviews). List user name, useful, cool and di erence of useful and cool. (Order by di erence, user name DESC) (1003635 rows)

    12. List pairs of friends and business names if both friends reviewed the same business with the same stars. List the business id, friends as user id 1 user id 2 and stars (Multiple pair reviews should also returned). You need to return all businesses for each (u1,u2) pair (There might be more than one review that two friends gave to a business. The stars were same before and even both changed their stars they meet in the same point again). However, if (u1,u2) is listed (u2,u1) should not be listed for the same business (Ordered by business id, stars DESC). (77157/79393 or 126629 rows)


Example to Clari cation:

user id1    user id2    business id

user1    user2    business1

user1 user2 business1 This is a valid return.

stars

5

4

    13. Cross tabulations Write a single SQL query that computes the statistics present in a cross tabulation over stars and state on Business table. The aggregate reported in the crosstab should simply count businesses, but restrict your attention to the open businesses. You only need to list stars, states and count. (147 rows)

    14. Window functions List the top 3 users (user id, review count and fans) and their rank (i.e. 1, 2, or 3) when ranked by review count for each grouping formed by number of fans. Restrict your attention to those with a number of fans between 50 and 60, inclusive. (33 rows)

















4
3.2.1    Task 2 Speci cations

Your submission    le format (for task2) should be as follows:

/* Question 1 */

SELECT...

/* Question 2 */

SELECT...

...

/* Question 14 */

SELECT...

You don’t have to write your queries on single lines. You are allowed (and encouraged) to write them in multiple lines for better readability. You should not write anything for the unsolved questions. You should have lines for only the solved questions.

3.3    Task 3 - Triggers and Views - 15 pts

Triggers. It is often useful to have the DBMS perform some actions automatically in response to operations on the database. Write necessary triggers to achieve the following functionalities:

    1. We want to keep review count for each user consistent with the number of reviews they have in Users table. Every time a new user review is inserted into Review table, the review count in Users table must be incremented. Write a trigger to keep review count of the users up to date in this manner.

    2. Our system does not want Users who write reviews with 0 stars. If there is a review with 0 stars that review should not be inserted into the Review table. Further, delete all reviews and tips of that user.


Views. It is ine cient to calculate the number of reviews (review count) of a business for each related query. Create a view called ’BusinessCount’ that contains the columns business id, business name, and review count. The review count should be the total number of reviews made for that business.


    • Submission

Send a ’tar.gz’ le with your 7 digit student id and starting with ’e’ like ’e1234567.tar.gz’ that contains all 3 ’.sql’ le completed in previous tasks. You should submit ’tar.gz’ le to Odtuclass before the deadline.


















5

More products