Project Description¶

Working on behalf of a company that sells motorcycle parts, we'll dig into their data to understand their revenue streams. We will find out how much net revenue they are generating across their product lines, segregating by date and warehouse.

The company operates three warehouses in the area, selling both retail and wholesale. They offer a variety of parts and accept credit cards, cash, and bank transfer as payment methods. However, each payment type incurs a different fee.

The board of directors wants to gain a better understanding of wholesale revenue by product line, and how this varies month-to-month and across warehouses. You have been tasked with calculating net revenue for each product line and grouping results by month and warehouse. The results should be filtered so that only "Wholesale" orders are included.

They have provided you with access to their database, which contains the following table called sales

Sales¶

Column Data type Description
order_number VARCHAR Unique order number.
date DATE Date of the order, from June to August 2021.
warehouse VARCHAR The warehouse that the order was made from— North, Central, or West.
client_type VARCHAR Whether the order was Retail or Wholesale.
product_line VARCHAR Type of product ordered.
quantity INT Number of products ordered.
unit_price FLOAT Price per product (dollars).
total FLOAT Total price of the order (dollars).
payment VARCHAR Payment method—Credit card, Transfer, or Cash.
payment_fee FLOAT Percentage of total charged as a result of the payment method.

Lets Find out how much Wholesale net revenue each product_line generated per month per warehouse in the dataset.¶

In [4]:
# load the dataset
import pandas as pd
df = pd.read_csv('sales.csv', usecols=lambda column: column != "index")
df.head()
Out[4]:
order_number date warehouse client_type product_line quantity unit_price total payment payment_fee
0 N1 2021-06-01T00:00:00.000 North Retail Braking system 9 19.29 173.61 Cash 0.00
1 N2 2021-06-01T00:00:00.000 North Retail Suspension & traction 8 32.93 263.45 Credit card 0.03
2 N3 2021-06-01T00:00:00.000 North Wholesale Frame & body 16 37.84 605.44 Transfer 0.01
3 N4 2021-06-01T00:00:00.000 North Wholesale Suspension & traction 40 37.37 1494.80 Transfer 0.01
4 N5 2021-06-01T00:00:00.000 North Retail Frame & body 6 45.44 272.61 Credit card 0.03
In [3]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   order_number  1000 non-null   object 
 1   date          1000 non-null   object 
 2   warehouse     1000 non-null   object 
 3   client_type   1000 non-null   object 
 4   product_line  1000 non-null   object 
 5   quantity      1000 non-null   int64  
 6   unit_price    1000 non-null   float64
 7   total         1000 non-null   float64
 8   payment       1000 non-null   object 
 9   payment_fee   1000 non-null   float64
dtypes: float64(3), int64(1), object(6)
memory usage: 78.3+ KB
In [5]:
# Convert 'date' column to datetime object
df['date'] = pd.to_datetime(df['date'])
In [17]:
# Extracting month from date, since we want to analyze the revenue per month as well
df['month'] = df['date'].dt.strftime('%b')

# Creating the net_revenue column
df['net_revenue'] = df['total'] - df['payment_fee']

df.head()
Out[17]:
order_number date warehouse client_type product_line quantity unit_price total payment payment_fee month net_revenue
0 N1 2021-06-01 North Retail Braking system 9 19.29 173.61 Cash 0.00 Jun 173.61
1 N2 2021-06-01 North Retail Suspension & traction 8 32.93 263.45 Credit card 0.03 Jun 263.42
2 N3 2021-06-01 North Wholesale Frame & body 16 37.84 605.44 Transfer 0.01 Jun 605.43
3 N4 2021-06-01 North Wholesale Suspension & traction 40 37.37 1494.80 Transfer 0.01 Jun 1494.79
4 N5 2021-06-01 North Retail Frame & body 6 45.44 272.61 Credit card 0.03 Jun 272.58
In [21]:
# Filter only Wholesale client_type
df_wholesale = df[df['client_type'] == 'Wholesale']

# Group by product_line, month, warehouse and sum net_revenue
revenue_by_product_line = df_wholesale.groupby(['product_line', 'month', 'warehouse'], as_index=False).agg({'net_revenue': 'sum'})

# Sort by product_line, month, and net_revenue in descending order
revenue_by_product_line = revenue_by_product_line.sort_values(by=['product_line', 'month', 'net_revenue'], ascending=[True, True, False])

# Display the result
revenue_by_product_line
Out[21]:
product_line month warehouse net_revenue
0 Braking system Aug Central 3039.41
2 Braking system Aug West 2500.67
1 Braking system Aug North 1770.84
3 Braking system Jul Central 3778.65
5 Braking system Jul West 3060.93
4 Braking system Jul North 2594.44
6 Braking system Jun Central 3684.89
7 Braking system Jun North 1487.77
8 Braking system Jun West 1212.75
10 Electrical system Aug North 4721.12
9 Electrical system Aug Central 3126.43
11 Electrical system Aug West 1241.84
12 Electrical system Jul Central 5577.62
13 Electrical system Jul North 1710.13
14 Electrical system Jul West 449.46
15 Electrical system Jun Central 2904.93
16 Electrical system Jun North 2022.50
17 Engine Aug Central 9528.71
18 Engine Aug North 2324.19
19 Engine Jul Central 1827.03
20 Engine Jul North 1007.14
21 Engine Jun Central 6548.85
22 Frame & body Aug Central 8657.99
23 Frame & body Aug North 7898.89
24 Frame & body Aug West 829.69
26 Frame & body Jul North 6154.61
25 Frame & body Jul Central 3135.13
27 Frame & body Jun Central 5111.34
28 Frame & body Jun North 4910.12
29 Frame & body Jun West 2779.74
31 Miscellaneous Aug North 1841.40
30 Miscellaneous Aug Central 1739.76
32 Miscellaneous Aug West 813.43
33 Miscellaneous Jul Central 3118.44
34 Miscellaneous Jul North 2404.65
35 Miscellaneous Jul West 1156.80
38 Miscellaneous Jun West 2280.97
36 Miscellaneous Jun Central 1878.07
37 Miscellaneous Jun North 513.99
39 Suspension & traction Aug Central 5416.70
40 Suspension & traction Aug North 4923.69
41 Suspension & traction Aug West 1080.79
42 Suspension & traction Jul Central 6456.72
43 Suspension & traction Jul North 3714.28
44 Suspension & traction Jul West 2939.32
46 Suspension & traction Jun North 8065.74
45 Suspension & traction Jun Central 3325.00
47 Suspension & traction Jun West 2372.52

Lets visualize the findiings to understand it better¶

In [23]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the figure size
plt.figure(figsize=(12, 6))

# Create a bar plot
sns.barplot(
    data=revenue_by_product_line,
    x="product_line",
    y="net_revenue",
    hue="warehouse",
    errorbar=None  # Replaces ci=None
)

# Rotate x-axis labels for better readability
plt.xticks(rotation=45)

# Add labels and title
plt.xlabel("Product Line")
plt.ylabel("Net Revenue")
plt.title("Net Revenue by Product Line and Warehouse (Wholesale Only)")

# Show legend
plt.legend(title="Warehouse")

# Display the plot
plt.show()
No description has been provided for this image
In [24]:
df
Out[24]:
order_number date warehouse client_type product_line quantity unit_price total payment payment_fee month net_revenue
0 N1 2021-06-01 North Retail Braking system 9 19.29 173.61 Cash 0.00 Jun 173.61
1 N2 2021-06-01 North Retail Suspension & traction 8 32.93 263.45 Credit card 0.03 Jun 263.42
2 N3 2021-06-01 North Wholesale Frame & body 16 37.84 605.44 Transfer 0.01 Jun 605.43
3 N4 2021-06-01 North Wholesale Suspension & traction 40 37.37 1494.80 Transfer 0.01 Jun 1494.79
4 N5 2021-06-01 North Retail Frame & body 6 45.44 272.61 Credit card 0.03 Jun 272.58
... ... ... ... ... ... ... ... ... ... ... ... ...
995 W176 2021-08-27 West Retail Electrical system 4 21.47 85.89 Credit card 0.03 Aug 85.86
996 W177 2021-08-28 West Wholesale Miscellaneous 32 25.42 813.44 Transfer 0.01 Aug 813.43
997 W178 2021-08-28 West Retail Electrical system 6 28.37 170.21 Credit card 0.03 Aug 170.18
998 W179 2021-08-28 West Wholesale Braking system 32 10.03 320.96 Transfer 0.01 Aug 320.95
999 W180 2021-08-28 West Wholesale Electrical system 12 32.80 393.64 Transfer 0.01 Aug 393.63

1000 rows × 12 columns

Lets compute total revenue, total net revenue, and average revenue per order¶

In [25]:
# Total revenue and net revenue
total_revenue = df["total"].sum()
total_net_revenue = df["net_revenue"].sum()
avg_revenue_per_order = df["net_revenue"].mean()

print(f"Total Revenue: ${total_revenue:,.2f}")
print(f"Total Net Revenue: ${total_net_revenue:,.2f}")
print(f"Average Revenue per Order: ${avg_revenue_per_order:,.2f}")
Total Revenue: $289,113.00
Total Net Revenue: $289,090.98
Average Revenue per Order: $289.09
In [26]:
df['month'].value_counts()
Out[26]:
month
Jul    345
Jun    338
Aug    317
Name: count, dtype: int64

Lets see the Revenue Trends Over Time¶

In [27]:
# Group by month and sum net revenue
monthly_revenue = df.groupby("month")["net_revenue"].sum().reset_index()

# Sort months in order, we have data for three months only
month_order = ["Jun", "Jul", "Aug" ]
monthly_revenue["month"] = pd.Categorical(monthly_revenue["month"], categories=month_order, ordered=True)
monthly_revenue = monthly_revenue.sort_values("month")

# Plot
plt.figure(figsize=(10, 5))
sns.lineplot(x="month", y="net_revenue", data=monthly_revenue, marker="o", linewidth=2)
plt.title("Monthly Revenue Trend")
plt.ylabel("Net Revenue ($)")
plt.xlabel("Month")
plt.grid(True)
plt.show()
C:\Users\newbe\anaconda3\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
C:\Users\newbe\anaconda3\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
No description has been provided for this image
In [29]:
### lets analyze and visualize Payment Method Preferences
In [30]:
# Count of each payment method
payment_counts = df["payment"].value_counts()

# makine a Pie chart
plt.figure(figsize=(7, 7))
plt.pie(payment_counts, labels=payment_counts.index, autopct="%1.1f%%", colors=["#ff9999","#66b3ff","#99ff99"])
plt.title("Payment Method Distribution")
plt.show()
No description has been provided for this image

lets compare retail vs wholesale revenue¶

In [31]:
# Group by client_type
client_revenue = df.groupby("client_type")["net_revenue"].sum().reset_index()

# Bar plot
plt.figure(figsize=(8, 5))
sns.barplot(x="client_type", y="net_revenue", data=client_revenue, palette="coolwarm")
plt.title("Retail vs Wholesale Revenue")
plt.ylabel("Net Revenue ($)")
plt.xlabel("Client Type")
plt.show()
No description has been provided for this image

Analyzing the warehouse performance¶

In [33]:
# Group by warehouse
warehouse_revenue = df.groupby("warehouse")["net_revenue"].sum().reset_index()

# Using Bar plot
plt.figure(figsize=(10, 5))
sns.barplot(x="warehouse", y="net_revenue", data=warehouse_revenue, palette="viridis")
plt.title("Revenue by Warehouse")
plt.ylabel("Net Revenue ($)")
plt.xlabel("Warehouse")
plt.show()
No description has been provided for this image

We can analyze the data even deeper, we can find the most and least profitable product lines, compare order volume & revenue of retail vs. wholesale clients.¶

We can also perform advanced analysis to Forecast Future Sales, we can also check which products are frequently bought together, and build interactive dahboards using PowerBI. Thank you!¶