Project Description¶

Los Angeles, California, is a city that attracts people from all over the world. It offers many opportunities, but not all of them are positive. In this project, we will explore crime data to understand when and where crimes happen most often in Los Angeles. We will also look at the types of crimes that are commonly committed in the city.

The Data¶

The dataset for this project has been kindly provided by the City of Los Angeles. The original data is updated every two months and can be found here.

The dataset we are using is a modified version of the original data, which is publicly available from Los Angeles Open Data.

crimes.csv¶

Column Description
'DR_NO' Division of Records Number: This is the official file number. It includes a 2-digit year, area ID, and 5 digits.
'Date Rptd' The date the crime was reported, in MM/DD/YYYY format.
'DATE OCC' The date the crime actually happened, in MM/DD/YYYY format.
'TIME OCC' The time when the crime occurred, given in 24-hour military time.
'AREA NAME' The name of one of the 21 geographic areas or patrol divisions. Each area is named after a landmark or community it serves. For example, the 77th Street Division covers neighborhoods near South Broadway and 77th Street in South Los Angeles.
'Crm Cd Desc' A description of the type of crime committed.
'Vict Age' The victim’s age in years.
'Vict Sex' The victim’s sex: F for Female, M for Male, and X for Unknown.
'Vict Descent' The victim’s ethnic background. Possible values include:
- A - Other Asian
- B - Black
- C - Chinese
- D - Cambodian
- F - Filipino
- G - Guamanian
- H - Hispanic/Latin/Mexican
- I - American Indian/Alaskan Native
- J - Japanese
- K - Korean
- L - Laotian
- O - Other
- P - Pacific Islander
- S - Samoan
- U - Hawaiian
- V - Vietnamese
- W - White
- X - Unknown
- Z - Asian Indian
'Weapon Desc' Description of the weapon used in the crime, if any.
'Status Desc' The current status of the crime case.
'LOCATION' The street address where the crime happened.

By analyzing this data, we aim to find patterns and trends in crime across Los Angeles. This can help improve understanding of crime hotspots and times, and support efforts to make the city safer.

Import required libraries¶

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns

In [5]:
# load the dataset
crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str}, usecols=lambda column: column != "index")
crimes.head()
Out[5]:
DR_NO Date Rptd DATE OCC TIME OCC AREA NAME Crm Cd Desc Vict Age Vict Sex Vict Descent Weapon Desc Status Desc LOCATION
0 220314085 2022-07-22 2020-05-12 1110 Southwest THEFT OF IDENTITY 27 F B NaN Invest Cont 2500 S SYCAMORE AV
1 222013040 2022-08-06 2020-06-04 1620 Olympic THEFT OF IDENTITY 60 M H NaN Invest Cont 3300 SAN MARINO ST
2 220614831 2022-08-18 2020-08-17 1200 Hollywood THEFT OF IDENTITY 28 M H NaN Invest Cont 1900 TRANSIENT
3 231207725 2023-02-27 2020-01-27 635 77th Street THEFT OF IDENTITY 37 M H NaN Invest Cont 6200 4TH AV
4 220213256 2022-07-14 2020-07-14 900 Rampart THEFT OF IDENTITY 79 M B NaN Invest Cont 1200 W 7TH ST

We will explore the crimes.csv dataset and use our findings to answer the following questions:¶

Which hour has the highest frequency of crimes?¶

In [6]:
# Extract the first two digits from "TIME OCC", representing the hour and convert to integer data type
crimes["HOUR OCC"] = crimes["TIME OCC"].str[:2].astype(int)

# Keeping only valid hour values (0-23)
crimes = crimes[crimes["HOUR OCC"].between(0, 23)]

# using countplot to find the largest frequency
plt.figure(figsize=(8, 6))
sns.countplot(data=crimes, x="HOUR OCC")
plt.xticks(rotation=45)  
plt.tight_layout()
plt.show()
No description has been provided for this image
In [3]:
# Midday has the largest volume of crime
peak_crime_hour = 12

Which area has the largest frequency of night crimes (crimes committed between 10pm and 3:59am)?¶

In [7]:
# Filter for the night-time hours, 0 = midnight; 3 = crimes between 3am and 3:59am, i.e., don't include 4
night_crimes = crimes[crimes["HOUR OCC"].isin([22,23,0,1,2,3])]
In [12]:
night_crimes = night_crimes.groupby("AREA NAME", as_index=False)["HOUR OCC"].count().sort_values("HOUR OCC", ascending=False)
night_crimes
Out[12]:
AREA NAME HOUR OCC
1 Central 921
6 Hollywood 718
0 77th Street 711
15 Southwest 661
11 Olympic 597
14 Southeast 585
9 Newton 577
12 Pacific 534
8 N Hollywood 513
17 Van Nuys 493
10 Northeast 485
20 Wilshire 476
13 Rampart 469
16 Topanga 458
19 West Valley 436
7 Mission 407
2 Devonshire 390
4 Harbor 389
3 Foothill 378
18 West LA 363
5 Hollenbeck 342
In [14]:
peak_night_crime_location = night_crimes.iloc[0]["AREA NAME"]
# Print the peak night crime location
print(f"The area with the largest volume of night crime is {peak_night_crime_location}")
The area with the largest volume of night crime is Central

Identify the number of crimes committed against victims of different age groups.¶

In [15]:
# Identify the number of crimes committed against victims by age groups 
age_bins = [0, 17, 25, 34, 44, 54, 64, np.inf]
age_labels = ["0-17", "18-25", "26-34", "35-44", "45-54", "55-64", "65+"]

# Create a new column 'Age Bracket' based on 'Vict Age'
crimes["Age Bracket"] = pd.cut(crimes["Vict Age"], bins=age_bins, labels=age_labels, right=True)
In [17]:
# Find the category with the largest frequency
victim_ages = crimes["Age Bracket"].value_counts()
print(victim_ages)
Age Bracket
26-34    18987
35-44    17042
18-25    11554
45-54    11408
55-64     7996
65+       5966
0-17      1964
Name: count, dtype: int64

We can analyze the dataset further and answer many more questions. For example, what type of crimes were mostly committed, when, where and which age group and gender was victim! Thank you.¶