Google Data Analytics| Bellabeat Case Study Using Bigquery Sandbox / SQL

Muhammad Humair Qureshi
15 min readNov 21, 2022

Bellabeat Case Study

Introduction

For Bellabeat Case Study 2 in Capstone Project, I am working as a junior data analyst for the Bellabeat marketing analyst team. Bellabeat is a high-tech manufacturer of health-focused products for women.

To answer key questions I followed the six steps of the data analysis process taught in the course Google Data Analytics Professional Certificate which are: Ask, Prepare, Process, Analyze, Share, and Act.

About The Company

Bellabeat is a high-tech company that makes smart health products. It was started by Urka Sren and Sando Mur.

Sren used her experience as an artist to make beautiful technology that helps women all over the world learn and get inspired. Bellabeat has given women more power by collecting data on their activity, sleep, stress, and reproductive health.

about their own health and how they live. Bellabeat has grown quickly and quickly since it began in 2013.

marketed itself as a tech-driven company for women’s health and wellness.

Scenario

Bellabeat co-founder and CCO Urka Sren believes analyzing smart device fitness data might help the company develop. I’m analyzing smart device data for one of Bellabeat’s products to learn how people use smart devices. My insights will shape the company’s marketing strategy. I’ll offer my findings and recommendations to Bellabeat’s executive team.

Urška Sršen, co-founder and Chief Creative Officer of Bellabeat

Sando Mur: Mathematician and Bellabeat’s cofounder; a key member of the Bellabeat executive team

1. Ask

Identify the business task

How can we make a better marketing strategy from trends in smart devices

Consider key stakeholders

Urška Sršen & Bellabeat executive team

Three Questions To Analyze:

1. What are some trends in smart device usage?

2. How could these trends apply to Bellabeat customers?

3. How could these trends help influence Bellabeat marketing strategy?

2. Prepare

Determine the credibility of the data

I will use ROCC to figure out if this data has any problems with bias or credibility.

Reliable: NOT reliable. This data only comes from 30 randomly chosen people, which is not a good representation of the more than 31 million people who use FitBit. This would mean a confidence level of 95% or 90% and a margin of error of 18% or 15%, which is not good. To get a high level of confidence (95%) and a low margin of error (5%), the sample size should be at least 10 times bigger than it is now. The Central Limit Theorem (CLT) says, however, that a sample size of 30 is the smallest sample size for which the CLT is still true. So, it’s good that at least this metric is met by the data that was given. Also, all of the data was collected in just one month, which is not long enough to find accurate and reliable trends. I would rather have at least a year’s worth of data to find meaningful trends and insights.

Original: NOT original. The data set was made by people who answered an Amazon Mechanical Turk survey. It would have been better if FitBit had given the information itself.

Comprehensive: NOT comprehensive. The data are not complete because they are missing some information that would help make a more accurate analysis (e.g., sex, age, height, etc.). Also, more data from more people would help make the whole thing more complete. For example, a more accurate sample bias of the more than 30 million FitBit users would help. Again, the data was only collected over a two-month period, which is not enough. I would rather have data from the past year. Also, there is no way to know if the people who were chosen were chosen because of bias or if they were chosen at random. What were the rules for choosing the 30 people? It would be helpful to know more about the data.

Current: NOT current. The data was collected six years ago, so it’s not a good picture of how things are going now.

Cited: This data is Cited

Overall the analysis we discuss the data now here is finding some insight into Bellabeat future marketing strategy.

3. Process

Download data and store it appropriately

1. Download this dataset.

2. Extract files.

3. Create a folder on your computer or Drive. Use appropriate file-naming conventions.

Here we are using Biguery for data Analyses so if you are using the free version of Bigquery which is called Bigquery Sandbox so follow my steps and upload each file one by one. But here first we clean each file like duplicate and error so here we use Excel

  • Daily_Activity_Merged
  • Daily_Sleep_Merged
  • Hourly_Steps_Merged
  • Hourly_Intensity_Merged
  • Hourly_Calories_Merged
  • Heart_Rate_Merged
  • Weight_Log_Merged

Using Excel to Clean Data

Each dataset was cleaned using Excel. The following steps were taken within each dataset:

  • Sorted and filtered data by Id to obtain how many unique users there were within the dataset.
  • Checked for duplicate data using the ‘duplicate data’ tool in Excel
  • Formatted date data into MM/DD/YY date format
  • Formatted all numerical data into Number format with either no decimals or up to 2 decimals.
  • Add a new file in which we add Activity Column in Daily_Activity_Merged then save as a new file DailyActivity_Merged_Weekday So, use the mentioned below formula to add days names from the date we can easily analyze by days and weekends through this column.

CHOOSE(WEEKDAY(B2),”Sunday”,”Monday”,”Tuesday”,”Wednesday”,”Thursday”,”Friday”,”Saturday”)

DailyActivity_Merged_Weekday
After adding the column of Activity_Day Highlight in yellow

After the cleaning process was done, there were only 3 rows in the Daily_Sleep_Merged file that had duplicate information. These were taken out before being looked at.

After cleaning the data, I decided to use SQL instead of staying in Excel to analyze it (although I did go back a few times for visualizations). What’s the point of SQL? SQL and Excel are the two skills that are most often asked for and used in a Data Analyst’s job. So I want to show off those skills here.

4 & 5. ANALYZE & SHARE Phases

Uploaded FitBit Fitness Tracker Data into BigQuery under the project scenic-kiln-368505.Bellabeat

Uploaded the following clean data sets:

  • dailyActivity_merged (dailyactive)
  • dailyActivity_merged_Weekday
  • sleepDay_merged (Sleepday)
  • heartrate_seconds_merged
  • hourlyCalroie_merged
  • hourlyIntensity_merged
  • hourlySteps_merged
  • weightLogInfo_merged

Here we use Bigquery Sandbox so you can upload we file one by one

how to upload a CSV file in Bigqurey

Here dailyActivity_merged (dailyactive) and dailyActivity_merged_Weekday uploaded successfully but the remaining file can’t upload because we facing this type of error mentioned below

This error is about TimeStamp because this file here time format is on local format SQL is accept UTC standard time zone so follow this step then upload this file.

Can’t upload files to SQL Timestamp Error uploading CSV in Bigquery

After uploading all files

Upload all CSV files in Bigquery Sandbox

Check each dataset

Now we analyze the user's Ids in each dataset to check no. of Id’s in this dataset

  • Daily_Activity_Merged — 33
  • Daily_Sleep_Merged — 24
  • Heart_Rate_Merged — 7
  • Hourly_Calorie_Merged — 33
  • Hourly_Intensity_Merged — 33
  • Hourly_Steps_Merged — 33
  • Weight_Log_Merged — 8

Both the Heart Rate and Weight Log datasets do not include enough data to move forward with analysis. We won’t use these sets of data.

User Insights

First, I wanted to see how many times each of the users wore/used the FitBit tracker:

Bigquery result

64% of users kept track of data for the whole data time period (04–12–2016 to 05–12–2016). When you add in the users who only missed 1–3 days, the number goes up to 82% of users who logged data or wore their FitBit Tracker consistently over the month.

Next, I wanted to divide the users into groups based on how often they wore their FitBit Fitness Tracker. I made three kinds of users:

  • Active User — wore their tracker for 25–31 days
  • Moderate User — wore their tracker for 15–24 days
  • Light User — wore their tracker for 0 to 14 days
Most Active Ids

If you want to visualization your data using Bigquery so check mention below link I guide you step by step process

According to this chat, we have more active users.

Next, I wanted to look more closely at the MIN, MAX, and AVG of total steps, total distance, calories, and activity levels by ID.

Next, I wanted to narrow down my results to just the averages of the different types of minutes by Id.

The average number of minutes spent in the Sedentary activity level was highest for each unique Id.

Last but not least, I wanted to look at the average number of active minutes per weekday before moving on to the types of users.

We already created the weekday sheet and also uploaded dailyActivity_merged_Weekday Now we analyze as per Activity day in which we see which day is the most active day.

Google Sheet

You can analyze Bigquery results through google sheet check mentioned link I guide step by step through scribehow

Again, this search shows that Sedentary Minutes are the most active kind of minute. What stands out is that there isn’t much difference between the types of active minutes by weekday. It looks like the number of active minutes users put in each day is the same. This information could show that Bellabeat could help users meet their activity goals. Users may already be trying to meet their own activity goals every day, and Bellabeat could encourage users to set higher activity goals to increase the number of very active or fairly active minutes they get every day.

User Types by Activity Levels

The CDC recommends 150 minutes of physical activity each week. (https://www.cdc.gov/physicalactivity/basics/adults/index.htm)

So, I wanted to add up the average minutes of Very Active and Fairly Active to see if each unique ID was meeting the CDC’s guidelines for activity.

When summed up, the results showed that most users were not engaging in milder activity for at least 150 minutes each day.

But if we add in the “lightly active” level, we see that most of the users now have an average of 150 minutes or more of an “active” period, which is what the CDC recommends.

On average, 27 of the 33 users met the CDC Activity Length Recommendations, while 6 did not.
I then broke this down by date, looking at one week’s worth of data, to see if it also matched what I found for the whole dataset. I looked at dates from April 17 to April 23, 2016.

Here, I used SUM instead of average because I was only tracking events for one week and not for the whole study.

During this first week of data, the results showed that more users met the CDC Activity Length Recommendations.

Out of 33 users, 31 met the CDC Activity Length Recommendations, 1 did not, and 1 did not have data from this time period.

I wanted to know what would happen if we skipped the Lightly Active Minutes for a week.

In this case, there was a big change in the number of users who met the CDC’s recommendations for how many active minutes they should spend each week.

Out of 33 users, 18 met the CDC Recommendations, 14 did not, and 1 did not have data from this time period.

Before I moved on to tracking steps, calories, and sleep, I wanted to look into this information during a week closer to the end of the dataset collection.

So I looked at the dates 2016–05–01 to 2016–05–07, which were a week before the end of the data collection (2016–05–12, a Thursday) and the last full week before the end.

When counting LightlyActiveMinutes in a query:

The results show that out of 33 users, 30 met CDC recommendations and 3 did not have data from this time period.
Now let’s see what happens when the LightlyActive Minutes are taken away.

Out of 33 users, 17 met CDC recommendations, 13 did not, and 3 did not have data from this time period.

If you want to analyze results through google Sheets so check this step-by-step guide mentioned below link.

When I looked at the April and May results side by side, I didn’t notice any big changes in how long I did things each week. Even so, these weeks are only separated by one week, and the whole time frame for collecting data is not very long. Even though the summary of the dataset being used (FitBit Fitness Tracker Data) said that data was collected from 2016–03–12 to 2016–05–12, most users didn’t start getting data until 2016–04–12. This means that there is only one month’s worth of data to use. This isn’t a long enough time to see if habits have changed between when the wearable was first used and when the data collection stopped.

So, I’m curious to see if the number of steps tracked could help classify users even more.

User Types by Total Steps

In her 2019 article “How many steps do I need a day?” for Healthline.com, Sara Lindberg cited a 2011 study by Tudor-Locke et. al. called “How many steps a day are enough? for adults,” which found that healthy adults should aim for 10,000 steps a day. Based on a 2011 study by Tudor-Locke et al., Lindberg (2019) divides activity level by steps into three groups:

  • Inactive: less than 5,000 steps per day
  • Average (somewhat active): ranges from 7,500 to 9,999 steps per day
  • Very Active: more than 12,500 steps per day

Sources: Healthline Article | 2011 Study

I will make user types for the different Ids in the Daily Activity dataset based on the above activity categories. I’m curious to see how these groups might be broken up.

When I was making my SQL query, I saw that the categories from the Healthline article did not cover everything. So, I made two other groups:

  • Low Active User: 5,000 to 7,499 steps
  • Active User: 10,000 to 12,499 steps

Here are the Results:

  • Inactive User: 8 users
  • Low Active User: 9 users
  • Average Active User: 9 users
  • Active User: 5 users
  • Very Active User: 2 users

If we divide this into “Non-Active” and “Low-Active” users and “Active” users, we can see that 17 users (52%) are not very active and 16 users (48%) are active. So about half of the users are men and half are women.

This split is close to what we saw when we looked at the number of active minutes. If we took the sum of the Lightly Active Minutes out of the query, we saw that 17 users (52%) met the CDC’s recommendation of 150 active minutes a week, 13 users (39%) did not, and 3 users (9%) did not have data from that week. After dividing the sample into different types of users, I wanted to compare how they did other things.

Calories, Steps & Active Minutes by ID

Now that we know more about how active our users are, I’m interested to see what their logged calories tell us about how many steps they take and how long they are active. This might be easier to understand with data visualization, but I’d still like to look into it in SQL for practice.

Total Steps by Day

Next, I wanted to look at how many steps people took on average each day to see if some days of the week were busier than others.

When the query was run, there wasn’t much difference between each day’s average steps. So, Saturday and the beginning of each week had the most steps on average (Monday and Tuesday). This could mean that the users wanted to be more active right after a weekend of rest (Sunday had the fewest total steps, and Friday wasn’t far behind) and that Saturday gave them more time to move and be active.

Total Steps by Hour

Next, I wanted to look at Total Steps by Hour to find out when our users were the most active.

The top 5 hours of steps recorded were:

  1. 18:00:00 (6pm) — 542,848 steps
  2. 19:00:00 (7pm) — 528,552 steps
  3. 12:00:00 (12pm) — 505,848 steps
  4. 17:00:00 (5pm) — 498,511 steps
  5. 14:00:00 (2pm) — 497,813 steps

Deeper Look Into Sleep

Then I wanted to look into how people sleep and how that compares to how much they do.

First, I looked at which date all the users had the most minutes of sleep.

Next I looked at average minutes slept, total steps and calories by user Id.

Tableau

The graphs show that most users with higher step counts went to bed for at least 5 hours. Still, most people weren’t taking the recommended 10,000 steps a day, as the Healthline article above says.

Read more: Google Data Analytics Professional Certificate | Bellabeat Capstone Project

6. Act

The business task is to analyze non-Bellabeat smart device usage data to acquire insight into relevant (successful and failed) consumer trends in the worldwide smart device industry and how to apply these trends to Bellabeat customers and affect future marketing tactics. Applying these insights to the Bellabeat App and future products maximizes revenues and growth for the company and capitalizes on Bellabeat’s fast-rising user base in the smart device/tech-wellness sector.

Trends Identify

  • The average Total Steps per day for participants was 8053, over 2000 steps below the necessary minimum.
  • In a month, participants spent 79% of their time sedentary.
  • The participants averaged 25.19 BMI, which is overweight.
  • On average, participants slept less than the recommended 7 hours.
  • The participants weren’t consistent with logging/tracking their data each day, and some didn’t log/track their sleep or weight (only 24 unique users for sleep and eight for weight — where only two of these eight were made up the majority of the inputs).
  • Participants didn’t lose weight, improve BMI or sleep quality, or increase exercise levels.

Recommendations

Bellabeat should enable in-app tournaments against friends or users in the same city/state to encourage continuous tracking.

  • Bellabeat could offer rewards or points redeemable for merchandise, discounts on future products, in-app features, or raffle tickets.
  • Bellabeat may give extra incentives (i.e., points) from Friday-Monday when many people lose interest and motivation.

Bellabeat goods should offer a TDEE calculator so consumers may input their sex, age, weight, height, and other information for accurate results.

  • This calculator tells the user their maintenance calories (and macros) and how much of a caloric deficit they need each day to lose X pounds per week, based on their weight goals and time frame.
  • Someone needing 2000 maintenance calories must be in a 500-calorie deficit every day to lose 1 lb of fat per week steadily.
  • The user would be notified if they reach or pass their daily caloric intake.
  • The user would be notified when they attained (or were on track for) their weight target.
  • Bellabeat’s membership includes nutritional guidance about healthy dishes and macros.
  • The software might list and show videos of quick-burning activities (since the average person is very sedentary and might not have a lot of time to spend hours in the gym, this would be a good incentive to exercise and burn a lot of calories in a short period of time).
  • These might be 30-minute, equipment-free activities that burn several hundred calories (i.e., crunches, jump rope, burpees, shadowboxing, HIIT, etc.).

Bellabeat products should automatically measure sleep because users struggled to do so.

  • Bellabeat might employ a Leaf or app notification to alert users when it’s time to sleep.
  • An hour or two before bed, the user would be warned to stop using blue-light electronics (it could even automatically switch the phone to night mode to prevent blue light exposure).
  • If the user takes Melatonin or sleep medication, they will be reminded 30 minutes before bedtime.
  • The user would be reminded a few minutes before bed to get in bed and sleep. This should enable the individual fall asleep in less than 30 minutes and increase the length and quality of sleep.

What is Blockchain Technology, how Distributed Ledger Technology (DLT)

--

--

Muhammad Humair Qureshi

I have learned a lot of analytical and research skills from chemistry. That’s why I’m interested in doing research. I’m data analyst & my background in research