Google Data Analytics Professional Certificate | Bellabeat Capstone Project

11 min readSep 23, 2022

Bellabeat Case Study

Introduction

Hola! Google Data Analytics Professional Certificate, I have learned a lot of analytical skills with the help of many different tools. For Case Study 2 in Capstone Project, I am working as a junior data analyst for the Bellabeat marketing analyst team. Bellabeat, a high-tech manufacturer of health-focused products for women.

To answer key questions I followed the six steps of the data analysis process taught in the course Google Data Analytics Professional Certificate which are: Ask, Prepare, Process, Analyze, Share, and Act.

About The Company

Bellabeat is a high-tech company that makes smart health products. It was started by Urka Sren and Sando Mur.

Sren used her experience as an artist to make beautiful technology that helps women all over the world learn and get inspired. Bellabeat has given women more power by collecting data on their activity, sleep, stress, and reproductive health.

about their own health and how they live. Bellabeat has grown quickly and quickly since it began in 2013.

marketed itself as a tech-driven company for women’s health and wellness.

Scenario

Bellabeat co-founder and CCO Urka Sren believes analyzing smart device fitness data might help the company develop. I’m analyzing smart device data for one of Bellabeat’s products to learn how people use smart devices. My insights will shape the company’s marketing strategy. I’ll offer my findings and recommendations to Bellabeat’s executive team.

Urška Sršen, co-founder and Chief Creative Officer of Bellabeat

Sando Mur: Mathematician and Bellabeat’s cofounder; a key member of the Bellabeat executive team

1. Ask

Identify the business task

How can we make a better marketing strategy from trends in smart devices

Consider key stakeholders

Urška Sršen & Bellabeat executive team

Three Questions To Analyze:

1. What are some trends in smart device usage?

2. How could these trends apply to Bellabeat customers?

3. How could these trends help influence Bellabeat marketing strategy?

2. Prepare

Determine the credibility of the data

I will use ROCC to figure out if this data has any problems with bias or credibility.

Reliable: NOT reliable. This data only comes from 30 randomly chosen people, which is not a good representation of the more than 31 million people who use FitBit. This would mean a confidence level of 95% or 90% and a margin of error of 18% or 15%, which is not good. To get a high level of confidence (95%) and a low margin of error (5%), the sample size should be at least 10 times bigger than it is now. The Central Limit Theorem (CLT) says, however, that a sample size of 30 is the smallest sample size for which the CLT is still true. So, it’s good that at least this metric is met by the data that was given. Also, all of the data was collected in just one month, which is not long enough to find accurate and reliable trends. I would rather have at least a year’s worth of data to find meaningful trends and insights.

Original: NOT original. The data set was made by people who answered an Amazon Mechanical Turk survey. It would have been better if FitBit had given the information itself.

Comprehensive: NOT comprehensive. The data are not complete because they are missing some information that would help make a more accurate analysis (e.g., sex, age, height, etc.). Also, more data from more people would help make the whole thing more complete. For example, a more accurate sample bias of the more than 30 million FitBit users would help. Again, the data was only collected over a two-month period, which is not enough. I would rather have data from the past year. Also, there is no way to know if the people who were chosen were chosen because of bias or if they were chosen at random. What were the rules for choosing the 30 people? It would be helpful to know more about the data.

Current: NOT current. The data was collected six years ago, so it’s not a good picture of how things are going now.

Cited: This data is Cited

Overall the analysis we discuss the data now here is finding some insight into Bellabeat future marketing strategy.

3. Process

Download data and store it appropriately

1. Download this dataset.

2. Extract files.

3. Create a folder on your computer or Drive. Use appropriate file-naming conventions.

The tool I have selected for data verification and cleaning:

Using R

R is a language for writing statistical programs. It does a great job of gaining insights (especially statistical insights) from data that has already been taken out of where it was stored (file, database, etc.) You can run models or make graphs with R.

Data Cleaning

In R for Wrangling the data (cleaning data), we first install this package

Install required packages

tidyverse for data import and wrangling

lubridate helps wrangle date attributes

Run this code in R

install.packages(tidyverse)

install.packages(lubridate)

library(tidyverse)

library(lubridate)

STEP 1: COLLECT DATA

Here earlier we created a main folder YYYY-MM-DD_Bellabeat_Exercise on your PC, now we import data from your system using R, here we follow these steps.

Upload YYYY-MM-DD_Bellabeat_Exercise datasets (csv files) here

First, we check our working directory through getwd() function

getwd()

after running this code we see the console panel here I attached the image below.

But here my main folder in E so here we set our working directory

Here is my file location “E:\HUMAIR\Capston Case Studies\2022–09–20_Bellabeat_Exercise\Dataset\Fitabase-Data”

Now we type a code then enter this file location in it but we also change the backslash “ \” into slash “/”

setwd(“E:/HUMAIR/Capston Case Studie/\2022–09-20_Bellabeat_Exercise/Dataset/Fitabase-Data”)

Now my working directory is set and here we uploading data from this address

Now we are ready to upload files from this folder so now we are using the read_csv function

daily_activity <- read_csv(“dailyActivity_merged.csv”)

weight <- read_csv(“weightLogInfo_merged.csv”)

daily_sleep <- read_csv(“sleepDay_merged.csv”)

The operator <- is to assign to a variable in the same environment. Why do we use this operator? Because we want the name of this project to be short and easy to understand.

Now we can upload files one by one, and when they’re done, we can see them in the Environment section. The image below will help you understand this better. In Console pan you check in yellow highlighted this is my set working directory

Verifying DATA

The next step is to make sure the data sets were imported correctly and look for obvious errors.

Using head Function

head(daily_activity)

head(weight)

head(daily_sleep)

colnames(daily_activity)

colnames(weight)

colnames(daily_sleep)

View(daily_activity)

View(daily_sleep)

View(weight)

I detected consistent data logging/tracking. Not everyone logged/tracked daily data. Some people neglected to wear their FitBits and logged 0 steps; this will bias any analysis, so I’ll eliminate them. Some didn’t track their sleep or weight. Some participants didn’t last the entire month. This makes a thorough analysis harder than expected.

daily_activity_V2 <- daily_activity %>%

filter(TotalSteps !=0)

Removing the zero steps will definitely help with the analysis!

Sleep and weight data sets both have date and time in one column. If I use the date to compare the three files, it’s preferable to separate them into the “Date” and “Time” columns.

weight_v2 <- weight %>%

separate(Date, c(“Date”, “Time”), “ “)

daily_sleep_v2 <- daily_sleep %>%

separate(SleepDay, c(“Date”, “Time”), “ “)

Now separate date and time into columns and here we find the how many unique ID in all three datasets.

n_distinct(daily_activity_V2$Id)

33 unique Id in daily_active_v2

n_distinct(weight_v2$Id)

n_distinct(daily_sleep_v2$Id)

Not all survey participants tracked each data set. Eight persons input their weight (two people logged/tracked most of the data), and 24 entered their sleep data. Daily activity data shows 33 persons.

Some data sets may have duplicate rows. For clearer data, I’ll check this and delete duplicate rows.

nrow(daily_activity_V2)

863

nrow(weight_v2)

nrow(daily_sleep_v2)

413

nrow(unique(daily_activity_V2))

863

nrow(unique(weight_v2))

nrow(unique(daily_sleep_v2))

410

I’m glad I checked. I’ll clean up the daily sleep data set by making a new one with only the rows that are different from the others.

daily_sleep_v3 <- unique(daily_sleep_v2)

4. Analyze

First, I am going to take a look at a detailed summary of each data set.

skim_without_charts(daily_activity_V2)

skim_without_charts(weight_v2)

skim_without_charts(daily_sleep_v3)

This is a good overview to make sure that all the cleaning that needed to be done was done and to see if there are any immediate problems that stand out when skimming. So far, it looks good, but I’d like to combine each file into just the columns I need for my more focused analysis.

I want to change the ‘Date’ column for ‘daily activity v3’ into ‘WeekDays’ to see if there is a correlation between which days of the week people are more consistent with logging/tracking their data.

Now, I want to use the “summary” function to look at a more detailed summary of the values.

summary(daily_activity_V3)

summary(weight_v3)

summary(daily_sleep_v4)

Summary

The average Total Steps for an individual is 8053.
The average minutes for Very Active is 23.02, for Fairly Active is 14.78, for Lightly Active is 210, and for Sedentary is 955.8.
The average BMI is 25.19.
The average minutes asleep is 419.2, whilst the average minutes in bed is 458.5.

5. Share

Now, I will present my insights and important findings through visualizations.

79% of the average user’s activity time for a month was spent sedentary when Daily Activity Levels in minutes were shown as a percentage. Very active and fairly active only makeup 2% and 1% of the entire time, respectively.

More steps equal more calories burned. Active people take more steps, which burns more calories. The average person in this data set takes roughly 8000 steps per day, which burns about 2500 calories.

Without knowing the person’s age, sex, and height, it’s impossible to say how many calories to burn for healthy weight loss. The BMI and weight of those who logged such numbers didn’t improve over the month of data collection.

Executed this code

Executed this code

Executed this code

I thought it would be interesting to plot the Total Minutes Asleep against each activity level to see if there is a connection. I thought that if someone got more sleep, they would be more active, and if they got less sleep, they would be less active (more sedentary). But that wasn’t what happened. No matter how well they slept, most people didn’t move around much. Surprisingly and strangely, though, people became the most sedentary when they slept the most minutes.

6. Act

The business task is to analyze non-Bellabeat smart device usage data to acquire insight into relevant (successful and failed) consumer trends in the worldwide smart device industry and how to apply these trends to Bellabeat customers and affect future marketing tactics. Applying these insights to the Bellabeat App and future products maximizes revenues and growth for the company and capitalizes on Bellabeat’s fast-rising user base in the smart device/tech-wellness sector.

Trends Identify

The average Total Steps per day for participants was 8053, over 2000 steps below the necessary minimum.
In a month, participants spent 79% of their time sedentary.
The participants averaged 25.19 BMI, which is overweight.
On average, participants slept less than the recommended 7 hours.
The participants weren’t consistent with logging/tracking their data each day, and some didn’t log/track their sleep or weight (only 24 unique users for sleep and eight for weight — where only two of these eight were made up the majority of the inputs).
Participants didn’t lose weight, improve BMI or sleep quality, or increase exercise levels.

Recommendations

Bellabeat should enable in-app tournaments against friends or users in the same city/state to encourage continuous tracking.

Bellabeat could offer rewards or points redeemable for merchandise, discounts on future products, in-app features, or raffle tickets.
Bellabeat may give extra incentives (i.e., points) from Friday-Monday when many people lose interest and motivation.

Bellabeat goods should offer a TDEE calculator so consumers may input their sex, age, weight, height, and other information for accurate results.

This calculator tells the user their maintenance calories (and macros) and how much of a caloric deficit they need each day to lose X pounds per week, based on their weight goals and time frame.
Someone needing 2000 maintenance calories must be in a 500-calorie deficit every day to lose 1 lb of fat per week steadily.
The user would be notified if they reach or pass their daily caloric intake.
The user would be notified when they attained (or were on track for) their weight target.
Bellabeat’s membership includes nutritional guidance about healthy dishes and macros.
The software might list and show videos of quick-burning activities (since the average person is very sedentary and might not have a lot of time to spend hours in the gym, this would be a good incentive to exercise and burn a lot of calories in a short period of time).
These might be 30-minute, equipment-free activities that burn several hundred calories (i.e., crunches, jump rope, burpees, shadowboxing, HIIT, etc.).

Bellabeat products should automatically measure sleep because users struggled to do so.

Bellabeat might employ a Leaf or app notification to alert users when it’s time to sleep.
An hour or two before bed, the user would be warned to stop using blue-light electronics (it could even automatically switch the phone to night mode to prevent blue light exposure).
If the user takes Melatonin or sleep medication, they will be reminded 30 minutes before bedtime.
The user would be reminded a few minutes before bed to get in bed and sleep. This should enable the individual fall asleep in less than 30 minutes and increase the length and quality of sleep.

If you want to learn with me follow me in: Newsletter, Website, and Twitter.

Google Data Analytics Professional Certificate | Bellabeat Capstone Project

Bellabeat Case Study

Introduction

About The Company

Scenario

1. Ask

Identify the business task

Consider key stakeholders

Three Questions To Analyze:

2. Prepare

Determine the credibility of the data

3. Process

Download data and store it appropriately

The tool I have selected for data verification and cleaning:

Data Cleaning

STEP 1: COLLECT DATA

Verifying DATA

4. Analyze

Summary

5. Share

6. Act

Trends Identify

Recommendations

Written by Muhammad Humair Qureshi

No responses yet