Project Description¶
Sleep is very important for both physical and mental health. Good quality sleep helps the body repair cells, improves memory, protects against illness, and supports overall well-being. In this project, we will act as data science consultants for SleepInc, a startup that makes a sleep tracking app called SleepScope. Our goal is to analyze anonymous sleep data collected by the app to find out how different lifestyle factors affect sleep quality and sleep duration.
The data comes from SleepScope, which gathers information from users about their sleep and daily habits. As data scientists, we will use Python to explore this lifestyle survey data. We want to find relationships between exercise, gender, occupation, and sleep quality. By analyzing the data, we hope to discover patterns that explain why some people sleep better than others.
The data: sleep_health_data.csv¶
We have a dataset containing anonymized sleep and lifestyle data for 374 people. The data shows average values for each person based on their activity over the last six months. This file is named sleep_health_data.csv
.
The dataset has 13 columns. These columns cover different areas such as how long people sleep, how good their sleep is, whether they have sleep disorders, how much they exercise, their stress levels, diet, age, and other health and demographic details. The information will help us understand how these factors connect to sleep quality.
Column | Description |
---|---|
Person ID |
A unique number that identifies each person in the dataset |
Gender |
The gender of the person, either Male or Female |
Age |
The age of the person in years |
Occupation |
The type of job or profession the person has |
Sleep Duration (hours) |
The average number of hours the person sleeps each day |
Quality of Sleep (scale: 1-10) |
A score from 1 to 10 given by the person that shows how good they think their sleep is |
Physical Activity Level (minutes/day) |
How many minutes per day the person spends doing physical exercise |
Stress Level (scale: 1-10) |
A score from 1 to 10 showing how much stress the person feels on average |
BMI Category |
The body mass index category of the person, such as Underweight, Normal, or Overweight |
Blood Pressure (systolic/diastolic) |
The average blood pressure reading shown as systolic over diastolic pressure |
Heart Rate (bpm) |
The average resting heart rate measured in beats per minute |
Daily Steps |
The average number of steps the person takes each day |
Sleep Disorder |
Whether the person has a sleep disorder, such as None, Insomnia, or Sleep Apnea |
This dataset will allow us to explore how different lifestyle and health factors relate to sleep quality and duration. By analyzing this data, we hope to provide useful insights that could help SleepInc improve their app and help users sleep better.
Which occupation has the lowest average sleep duration?¶
# import required library
import pandas as pd
# load the data
sleep_df = pd.read_csv('sleep_health_data.csv')
# Groupby occupation and calculate mean sleep duration
sleep_duration = sleep_df.groupby('Occupation')['Sleep Duration'].mean()
# Get occupation with lowest average sleep duration
lowest_sleep = sleep_duration.sort_values().index[0]
print(f"The occupation *{lowest_sleep}* has the lowest average sleep duration.")
The occupation *Sales Representative* has the lowest average sleep duration.
Which occupation has the lowest average sleep quality?¶
# Groupby occupation and calculate average sleep quality
sleep_quality = sleep_df.groupby('Occupation')['Quality of Sleep'].mean()
# Get occupation with lowest average sleep quality
lowest_sleep_quality = sleep_quality.sort_values().index[0]
print(f"The occupation *{lowest_sleep_quality}* has the lowest average sleep quality.")
The occupation *Sales Representative* has the lowest average sleep quality.
Explore how BMI Category can affect sleep disorder rates. Find what ratio of app users in each BMI Category have been diagnosed with Insomnia.¶
# Filter the full dataframe to only rows where BMI Category is Normal and Sleep Disorder is Insomnia.
normal = sleep_df[(sleep_df["BMI Category"] == "Normal") & (sleep_df["Sleep Disorder"] == "Insomnia")]
# Total normal rows
total_normal = len(sleep_df[sleep_df["BMI Category"] == "Normal"])
# Calculate normal insomnia ratio
normal_insomnia_ratio = round(len(normal) / total_normal, 2)
print(f"Among users with a Normal BMI, {normal_insomnia_ratio * 100}% have been diagnosed with Insomnia.")
# Filter to only rows where BMI Category is Overweight and Sleep Disorder is Insomnia.
overweight = sleep_df[(sleep_df["BMI Category"] == "Overweight") & (sleep_df["Sleep Disorder"] == "Insomnia")]
# Total overweight rows
total_overweight = len(sleep_df[sleep_df["BMI Category"] == "Overweight"])
# Calculate overweight insomnia ratio
overweight_insomnia_ratio = round(len(overweight) / total_overweight, 2)
print(f"Among users with an overweight BMI, {overweight_insomnia_ratio * 100}% have been diagnosed with Insomnia.")
# Filter to only rows where BMI Category is Obese and Sleep Disorder is Insomnia.
obese = sleep_df[(sleep_df["BMI Category"] == "Obese") & (sleep_df["Sleep Disorder"] == "Insomnia")]
# Total obese rows
total_obese = len(sleep_df[sleep_df["BMI Category"] == "Obese"])
# Calculate obese insomnia ratio
obese_insomnia_ratio = round(len(obese) / total_obese, 2)
print(f"Among users with an Obese BMI, {obese_insomnia_ratio * 100}% have been diagnosed with Insomnia.")
Among users with a Normal BMI, 4.0% have been diagnosed with Insomnia. Among users with an overweight BMI, 43.0% have been diagnosed with Insomnia. Among users with an Obese BMI, 40.0% have been diagnosed with Insomnia.
# Create dictionary to store the ratios for each BMI category
bmi_insomnia_ratios = {
"Normal": normal_insomnia_ratio,
"Overweight": overweight_insomnia_ratio,
"Obese": obese_insomnia_ratio
}
bmi_insomnia_ratios
{'Normal': 0.04, 'Overweight': 0.43, 'Obese': 0.4}
# Read in the data
sleep_df = pd.read_csv('sleep_health_data.csv')
# 1. Which occupation has the lowest average sleep duration? Save this in a string variable called `lowest_sleep_occ`.
# Groupby occupation and calculate mean sleep duration
sleep_duration = sleep_df.groupby('Occupation')['Sleep Duration'].mean()
# Get occupation with lowest average sleep duration
lowest_sleep_occ = sleep_duration.sort_values().index[0]
# 2. Which occupation had the lowest quality of on average? Did the occupation with the lowest sleep duration also have the worst sleep quality?
# Groupby occupation and calculate average sleep quality
sleep_quality = sleep_df.groupby('Occupation')['Quality of Sleep'].mean()
# Get occupation with lowest average sleep quality
lowest_sleep_quality_occ = sleep_quality.sort_values().index[0]
# Compare occupation with the least sleep to occupation with the lowest sleep quality
if lowest_sleep_occ == lowest_sleep_quality_occ:
same_occ = True
else:
same_occ = False
# 3. Let's explore how BMI Category can affect sleep disorder rates. Start by finding what ratio of app users in each BMI category have been diagnosed with Insomnia.
# Normal
# Filter the full dataframe to only rows where BMI Category is Normal and Sleep Disorder is Insomnia.
normal = sleep_df[(sleep_df["BMI Category"] == "Normal") &
(sleep_df["Sleep Disorder"] == "Insomnia")]
normal2 = sleep_df[(sleep_df["BMI Category"] == "Normal Weight") &
(sleep_df["Sleep Disorder"] == "Insomnia")]
# Total normal rows
total_normal = len(sleep_df[sleep_df["BMI Category"] == "Normal"])
# Calculate normal insomnia ratio
normal_insomnia_ratio = round(len(normal) / total_normal, 2)
# Overweight
# Filter the full dataframe to only rows where BMI Category is Overweight and Sleep Disorder is Insomnia.
overweight = sleep_df[(sleep_df["BMI Category"] == "Overweight") &
(sleep_df["Sleep Disorder"] == "Insomnia")]
# Total overweight rows
total_overweight = len(sleep_df[sleep_df["BMI Category"] == "Overweight"])
# Calculate overweight insomnia ratio
overweight_insomnia_ratio = round(len(overweight) / total_overweight, 2)
# Obese
# Filter the full dataframe to only rows where BMI Category is Obese and Sleep Disorder is Insomnia.
obese = sleep_df[(sleep_df["BMI Category"] == "Obese") &
(sleep_df["Sleep Disorder"] == "Insomnia")]
# Total obese rows
total_obese = len(sleep_df[sleep_df["BMI Category"] == "Obese"])
# Calculate obese insomnia ratio
obese_insomnia_ratio = round(len(obese) / total_obese, 2)
# Create dictionary to store the ratios for each BMI category
bmi_insomnia_ratios = {
"Normal": normal_insomnia_ratio,
"Overweight": overweight_insomnia_ratio,
"Obese": obese_insomnia_ratio
}