Sunday, February 5, 2023

UC Davis: SQL for Data Science - Profiling and Analyzing the Yelp Dataset

Whats is Yelp? It aims to provide a one-stop platform for local businesses. Like other social media, it creates community. And that community discovers, transacts, and reviews local businesses around them.

For this post we will use Yelp dataset and will formulate ideas for analysis.

First, we need to explore the dataset. Below is the structure data in table form.

The Yelp Dataset
The Yelp Dataset



And for me to understand the above data and the entirety of its contents, I have visited the YELP website and marked the table names and column names against its pages.

The Yelp Dataset - Business
The Yelp Dataset - Business

The Yelp Dataset - Review
The Yelp Dataset - Review

The Yelp Dataset - User Page
The Yelp Dataset - User Page



Armed with understanding, profiling and establishing relationship among multiple tables, you can now look onto subjects that can be useful for different stakeholders.

Some of the business questions lingered on my mind upon seeing the above data were:
  • What cities could be the best choice in setting up a business?
  • Are there weight on the reviews coming from elite year users?
  • What are the main words used in text reviews for sentiment analysis?
  • Finding the effect of review_count and check-in on the life of a business.

I focused on the latter due to the limitations of SQLite.
SELECT business.name
, business.stars
, business.review_count
, business.is_open
, SUM(checkin.count)
FROM business INNER JOIN checkin ON business.id =checkin.business_id
GROUP BY business.name
ORDER BY SUM(checkin.count) DESC


Two (2) tables were needed to get the answer for the above analysis. I used INNER JOIN to avoid getting the ‘None’ value.

The below selected data will prove that even a company received only 1 star it is still open for business. That businesses that received highest checkin count do not necessarily have 5 stars and so on. Thus, this data means that there is no direct correlation on stars, review_count and checkin.

It would be good for YELP to indicate the current population for the neighbourhood on that business.

The Yelp Dataset — Correlation of Review Count, Checkin, and Stars
The Yelp Dataset — Correlation of Review Count, Checkin, and Stars


That’s all for now! Be curious y’all! Don’t forget to click the clap button! *wink-wink*

No comments:

Get this gadget at facebook popup like box