Project Description¶
Netflix started in 1997 as a DVD rental service and has grown into one of the largest entertainment and media companies in the world. With thousands of movies and series available on the platform, it is a great chance to practice exploratory data analysis (EDA) while exploring the entertainment industry.
In this project, you work for a production company that focuses on nostalgic styles. Your goal is to research movies released in the 1990s. By analyzing Netflix data, you will perform exploratory data analysis to better understand the movies and shows from that exciting decade.
You have been given a dataset called netflix_data.csv
. Below is a table describing the columns in this dataset. After completing your initial analysis, you can also explore the data further on your own.
The data: netflix_data.csv¶
Column | Description |
---|---|
show_id |
The unique ID of the show |
type |
The type of show (e.g., Movie, TV Show) |
title |
The title or name of the show |
director |
The director(s) of the show |
cast |
The main actors or cast members |
country |
The country where the show was made |
date_added |
The date when the show was added to Netflix |
release_year |
The year the show was released on Netflix |
duration |
The length of the show in minutes |
description |
A brief summary or description of the show |
genre |
The genre or category of the show |
This dataset allows you to explore many aspects of Netflix’s library, such as popular genres, directors, cast, and the characteristics of movies from the 1990s. You can use this information to identify trends and gather insights to support your production company’s focus on nostalgic content.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("netflix_df.csv", usecols=lambda column: column != "index")
df.head()
show_id | type | title | director | cast | country | date_added | release_year | duration | description | genre | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | s2 | Movie | 7:19 | Jorge Michel Grau | Demián Bichir, Héctor Bonilla, Oscar Serrano, ... | Mexico | December 23, 2016 | 2016 | 93 | After a devastating earthquake hits Mexico Cit... | Dramas |
1 | s3 | Movie | 23:59 | Gilbert Chan | Tedd Chan, Stella Chung, Henley Hii, Lawrence ... | Singapore | December 20, 2018 | 2011 | 78 | When an army recruit is found dead, his fellow... | Horror Movies |
2 | s4 | Movie | 9 | Shane Acker | Elijah Wood, John C. Reilly, Jennifer Connelly... | United States | November 16, 2017 | 2009 | 80 | In a postapocalyptic world, rag-doll robots hi... | Action |
3 | s5 | Movie | 21 | Robert Luketic | Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar... | United States | January 1, 2020 | 2008 | 123 | A brilliant group of students become card-coun... | Dramas |
4 | s6 | TV Show | 46 | Serdar Akar | Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan... | Turkey | July 1, 2017 | 2016 | 1 | A genetics professor experiments with a treatm... | International TV |
What was the most frequent movie duration in the 1990s?¶
# filter the data for type 'Movie' only
df_movies = df[df['type'] == 'Movie']
# filter the data to keep movies in 1990s
movies_1990s = df_movies[(df_movies["release_year"] >= 1990) & (df_movies["release_year"] < 2000)]
plt.hist(movies_1990s['duration'])
plt.title('Distribution of Movies duration in 1990s')
plt.xlabel('Duration of movies (minutes)')
plt.ylabel('Number of movies')
plt.show()
A movie is considered short if it is less than 90 minutes. Counting the number of short movies in 1990s¶
# filter the data for short movies
short_movies = movies_1990s[movies_1990s['duration'] < 90]
no_of_short_moveis = short_movies['title'].nunique()
print(f"The number of short movies on netflix in 1990s was {no_of_short_moveis}")
The number of short movies on netflix in 1990s was 34
Counting the number of short movies by genre¶
short_movies_genre = short_movies.groupby('genre').agg({'title': 'count'})
short_movies_genre.columns = ['count']
short_movies_genre
count | |
---|---|
genre | |
Action | 7 |
Children | 8 |
Comedies | 8 |
Documentaries | 1 |
Dramas | 2 |
Stand-Up | 8 |
Average duration of short movie by genre¶
avg_duration_short_movies = short_movies.groupby('genre').agg({'duration': 'mean'}).round(2)
avg_duration_short_movies
duration | |
---|---|
genre | |
Action | 84.14 |
Children | 81.00 |
Comedies | 76.38 |
Documentaries | 49.00 |
Dramas | 80.00 |
Stand-Up | 53.25 |