Case Study: BellabeaT

Scenario

In this data analytics case study, I am an analyst on the marketing team of the company Bellabeat, which manufactures technological health products for women. Wishing to expand the company's presence in the smart device market, cofounder and Chief Creative Officer Urška Sršen feels that an analysis of fitness data of a rival manufacturer's products (in this case, FitBit) would be a source of valuable insight. My role is to analyze this data in an effort to determine how these devices are being used, leveraging the findings to make recommendations to the marketing team with regard to one of Bellabeat's products, which are as follows:

  • Bellabeat app - gives customers health data concerning sleep, activity, stress, menstrual cycle, and mindfulness habits

  • Leaf – a wellness tracker wearable as a bracelet, necklace, or clip. Communicates with the app to track sleep, stress, and activity

  • Time – a tracker like the leaf, but in watch format

  • Spring – a water bottle which allows users to track hydration

  • Membership – a subscription service offering 24/7 personalized advice on health, wellness, nutrition, and lifestyle

I am advised to use the following questions as guidance:

  • What are some trends in smart device usage?

  • How could these trends apply to Bellabeat customers?

  • How could these trends help influence Bellabeat marketing strategy?

I am tasked with delivering the following:

  • A clear summary of the business task

  • A description of all data sources used

  • Documentation of any cleaning or manipulation of data

  • A summary of your analysis

  • Supporting visualizations and key findings

  • Your top high-level content recommendations based on your analysis


Summary

To show how customers use smart devices, my analysis will indicate that there is massive room for improvement. The Bellabeat product I will focus on is the app. While the wearables are absolutely crucial in gathering relevant data, the app is where the data is stored, displayed, and made use of. As my analysis will show, the app is the cornerstone of the product environment, and its robustness and ease of use will encourage the more consistent use of the other products, giving users more power to harness their health data.


Data sources

The data used is from Amazon Mechanical Turk, who through survey identified eligible Fitbit users to submit tracker data. It is a collection of CSV files containing data from 33 users over the course of 31 days including information about activity (measured through heart rate), steps taken, calories burned, sleep, and weight. The files are as follows:

  • dailyActivitiy_merged

    • A collection of the information from dailyCalories_merged, dailyIntensities_merged, and dailySteps_merged. This was the file I used most often in my analysis.

  • dailyCalories_merged

    • Contains a record of each user's daily calorie expenditure by date.

  • dailyIntensities_merged

    • A record of each user's daily minutes and distance organized into sedentary, light, fairly active, and very active categories

  • dailySteps_merged

    • A record of each user's daily steps taken

  • Heartrate_seconds_merged

    • The largest file, displaying heartrate information on a second by second basis. Could be of use to medical researchers, or smart device engineers, but far too granular to be particularly useful for this analysis.

  • hourlyCalories_merged

    • A record of each user's calories burned by hour. Useful for determining what times of day users are most active

  • hourlyIntensities_merged

    • A record of each user's activity intensity by hour, measured in numbers as "TotalIntensity" and "AverageIntensity". It is unclear how these relate to the intensity categories seen in dailyIntensities_merged.

  • hourlySteps_merged

    • A record of each user's steps taken per hour

  • minuteCaloriesNarrow_merged

    • The second largest file, containing a record of each user's calories burned minute by minute.

  • minuteCaloriesWide_merged

    • Another record of each user's calories burned by the minutes, but with the column descending hourly and the rows across breaking down each minute of that hour.

  • minuteIntensitiesNarrow_merged

    • A record of each user's intensities measured minute by minute

  • minuteIntensitiesWide_merged

    • A record of each user's intensities measured minute by minute with the data spread horizontally for the 60 minutes of each hour

  • minuteMETsNarrow_merged

    • A record of each user's METs by the minute. METs are a measurement of your metabolic rate compared to your resting metabolic rate. Resting metabolic rate is a value of 1 MET.

  • minuteSleep_merged

    • A record of each user's sleep by the minute, measured by the unexplained "value" column which ranges from 1-3

  • minuteStepsNarrow_merged

    • A record of each user's steps measured every minute

  • minutesStepsWide_merged

    • Same as previous, but with the minutes of each hour displayed horizontally

  • sleepDay_merged

    • A record of each user's daily sleep, showing amount of sleeps recorded, time asleep, and total time in bed

  • weightLogInfo_merged

    • A record of users' weight by date measured in pounds and kilograms as well as body fat percentage, BMI, and a log of whether or not the weight was recorded manually


ROCCC Evaluation

Is the data reliable?

The reliability of the data is quite flawed due to:

  • Small sample size 

  • Inconsistent data logging among participants

The sleep data is a perfect demonstration of both the data’s small sample size and inconsistent data logging. Out of the study’s 33 participants, only 24 users logged sleep data, and only three of those 24 users logged data for all 31 days of the study. The average amount of days of sleep data logged was 17.

Weight data is even more sparse, with only 8 users logging any data throughout the 31 days of the study.  The highest amount of days of data logged by a user being 30, with the average coming out to 8.4. Logging of activities is sparser still, with only 4 users participating, and very inconsistently at that.

This flawed data presents challenges in analysis, but it also provides insights into how products can be improved.  

Is the data original?

The data appears to be original. It is published on both Zenodo and OpenAIRE, which are operated by CERN.

Is the data comprehensive?

The data could be more comprehensive with a larger and more consistent participant group. Furthermore, since users are only identified by an ID number, we lack useful demographic information that could provide greater insights and allow for analysis of any potential selection bias in the data.

Is the data current?

This data set was gathered between 3/12/2016 and 5/12/2016, it cannot be said that it is current.

Is the data cited?

Zenodo shows three citations of the data and OpenAIRE shows five. It is unknown whether that totals eight citations, or if there is overlap between the reports of citations.


I performed data cleaning in Google Sheets. The primary spreadsheets I used were:

  • dailyActivity_merged

  • hourlyCalories_merged

  • sleepDay_merged

Sheets' data cleaning function revealed no problems in dailyActivity_merged or hourlyCalories_merged, but caught some duplicate rows in sleepDay_merged which were removed.

Other data manipulation was performed by making a determination of how many of the 31 days each user participated and logged any data, as well as further determining how many of those days actually included usage of the device. I determined this by:

  • Non-zero value in the steps category

  • Non-zero value in the distance category

  • 1440 minutes rates as sedentary (every minute of the day)

  • A number of calories burned specific to the user (likely based on their estimated metabolic rate)

I also averaged several data types by user, such as steps (I neglected to do the same for distance, as the correlation between steps and distance was so closely aligned) calories, the various activity levels, average time asleep, average time in bed, and the percentage of the time spent asleep vs total time in bed as well as adjusted averages focusing on only the days that the user was determined to use the device.

Data cleaning


Summary of analysis

The most important finding in my analysis is that fitness tracking devices and software are used inconsistently. This hampers the user's ability to track trends in their health markers and make the greatest use of their data. While having more robust data on sleep and body weight would surely have been invaluable, this presents an area for opportunity. Based on the available information, I was able to uncover relationships between activity, calories, and sleep, as well as determining times of day and days of the week when users tend to be most and least active.

Supporting visualization and key findings:

The inconsistency and unreliability of the data mentioned in the description points not simply to that in and of itself, but also what appears to be a barrier for users to easily track those parameters. This has affected the particular data in question, but represents an area of opportunity for users to more easily log this information and harness the data themselves.

There are findings about the relationship between activity, calories burned, and sleep that will make for good use by the marketing team. The most basic finding shows a positive correlation between daily steps taken and calories burned.

The relationship between activity and sleep paint a story of efficiency; not only do users who burn more calories and accrue more highly active minutes spend less time in bed, but they have a higher percentage of their time in bed actually spent sleeping. 

Digging into the data about hourly calorie expenditure reveals two separate spikes of activity surrounding 12pm and 6pm. The consecutive hours from 11pm to 4am rate as the hours of lowest activity, likely reflecting the time when most people will be sleeping.

Ranking the days of the week, we see that Tuesdays and Saturdays are the most active, while Thursdays and Sundays are the least active.


high-level content recommendations

New marketing content should be focused around:

  • The benefits of activity

  • How Bellabeat's products allow for the tracking of activity

  • How tracking can support greater accountability for engaging in a certain level of activity 

The importance of activity can be presented in two ways: 

  • As a means of burning calories

  • As a means of improving sleep quality.

A higher step count is associated with greater caloric use, and it is also be useful to point out that the average user was found to be least active on Thursdays and Sundays; by simply bringing up their activity levels to match with the days of highest activities, the average user stands to expend about 350 extra calories per week. Given that one pound of fat is roughly equivalent to 3,500 calories (1), this extra expenditure could lead to a loss of one pound of fat every ten weeks, or five pounds in a year, without any change in dietary habits.

Promoting the benefits of activity on sleep will appeal to users conscious of sleep quality as well as the efficiency-minded. Data shows that daily calories burned correlate with:

  • Less total time in bed

  • More of that time in bed spent sleeping 

This can be tied back into the fat loss angle, as studies have shown that sleep is a very powerful factor in fat loss as opposed to simple general tissue loss. (1, 2)

Final content recommendations will have to wait until future versions of the products are available, but it should strongly be taken into consideration that the inconsistencies present in the available data be used as an inspiration for designing products which will encourage the more consistent logging of information regarding exercise activities, sleep, and weight tracking, both through ease of use as well as timed reminders. These reminders could also be helpful in reinforcing the previous idea of increasing activity on more sedentary days.