Project Description¶
Los Angeles, California, is a city that attracts people from all over the world. It offers many opportunities, but not all of them are positive. In this project, we will explore crime data to understand when and where crimes happen most often in Los Angeles. We will also look at the types of crimes that are commonly committed in the city.
The Data¶
The dataset for this project has been kindly provided by the City of Los Angeles. The original data is updated every two months and can be found here.
The dataset we are using is a modified version of the original data, which is publicly available from Los Angeles Open Data.
crimes.csv
¶
Column | Description |
---|---|
'DR_NO' |
Division of Records Number: This is the official file number. It includes a 2-digit year, area ID, and 5 digits. |
'Date Rptd' |
The date the crime was reported, in MM/DD/YYYY format. |
'DATE OCC' |
The date the crime actually happened, in MM/DD/YYYY format. |
'TIME OCC' |
The time when the crime occurred, given in 24-hour military time. |
'AREA NAME' |
The name of one of the 21 geographic areas or patrol divisions. Each area is named after a landmark or community it serves. For example, the 77th Street Division covers neighborhoods near South Broadway and 77th Street in South Los Angeles. |
'Crm Cd Desc' |
A description of the type of crime committed. |
'Vict Age' |
The victim’s age in years. |
'Vict Sex' |
The victim’s sex: F for Female, M for Male, and X for Unknown. |
'Vict Descent' |
The victim’s ethnic background. Possible values include: - A - Other Asian- B - Black- C - Chinese- D - Cambodian- F - Filipino- G - Guamanian- H - Hispanic/Latin/Mexican- I - American Indian/Alaskan Native- J - Japanese- K - Korean- L - Laotian- O - Other- P - Pacific Islander- S - Samoan- U - Hawaiian- V - Vietnamese- W - White- X - Unknown- Z - Asian Indian |
'Weapon Desc' |
Description of the weapon used in the crime, if any. |
'Status Desc' |
The current status of the crime case. |
'LOCATION' |
The street address where the crime happened. |
By analyzing this data, we aim to find patterns and trends in crime across Los Angeles. This can help improve understanding of crime hotspots and times, and support efforts to make the city safer.
Import required libraries¶
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns
# load the dataset
crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str}, usecols=lambda column: column != "index")
crimes.head()
DR_NO | Date Rptd | DATE OCC | TIME OCC | AREA NAME | Crm Cd Desc | Vict Age | Vict Sex | Vict Descent | Weapon Desc | Status Desc | LOCATION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 220314085 | 2022-07-22 | 2020-05-12 | 1110 | Southwest | THEFT OF IDENTITY | 27 | F | B | NaN | Invest Cont | 2500 S SYCAMORE AV |
1 | 222013040 | 2022-08-06 | 2020-06-04 | 1620 | Olympic | THEFT OF IDENTITY | 60 | M | H | NaN | Invest Cont | 3300 SAN MARINO ST |
2 | 220614831 | 2022-08-18 | 2020-08-17 | 1200 | Hollywood | THEFT OF IDENTITY | 28 | M | H | NaN | Invest Cont | 1900 TRANSIENT |
3 | 231207725 | 2023-02-27 | 2020-01-27 | 635 | 77th Street | THEFT OF IDENTITY | 37 | M | H | NaN | Invest Cont | 6200 4TH AV |
4 | 220213256 | 2022-07-14 | 2020-07-14 | 900 | Rampart | THEFT OF IDENTITY | 79 | M | B | NaN | Invest Cont | 1200 W 7TH ST |
# Extract the first two digits from "TIME OCC", representing the hour and convert to integer data type
crimes["HOUR OCC"] = crimes["TIME OCC"].str[:2].astype(int)
# Keeping only valid hour values (0-23)
crimes = crimes[crimes["HOUR OCC"].between(0, 23)]
# using countplot to find the largest frequency
plt.figure(figsize=(8, 6))
sns.countplot(data=crimes, x="HOUR OCC")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Midday has the largest volume of crime
peak_crime_hour = 12
Which area has the largest frequency of night crimes (crimes committed between 10pm and 3:59am)?¶
# Filter for the night-time hours, 0 = midnight; 3 = crimes between 3am and 3:59am, i.e., don't include 4
night_crimes = crimes[crimes["HOUR OCC"].isin([22,23,0,1,2,3])]
night_crimes = night_crimes.groupby("AREA NAME", as_index=False)["HOUR OCC"].count().sort_values("HOUR OCC", ascending=False)
night_crimes
AREA NAME | HOUR OCC | |
---|---|---|
1 | Central | 921 |
6 | Hollywood | 718 |
0 | 77th Street | 711 |
15 | Southwest | 661 |
11 | Olympic | 597 |
14 | Southeast | 585 |
9 | Newton | 577 |
12 | Pacific | 534 |
8 | N Hollywood | 513 |
17 | Van Nuys | 493 |
10 | Northeast | 485 |
20 | Wilshire | 476 |
13 | Rampart | 469 |
16 | Topanga | 458 |
19 | West Valley | 436 |
7 | Mission | 407 |
2 | Devonshire | 390 |
4 | Harbor | 389 |
3 | Foothill | 378 |
18 | West LA | 363 |
5 | Hollenbeck | 342 |
peak_night_crime_location = night_crimes.iloc[0]["AREA NAME"]
# Print the peak night crime location
print(f"The area with the largest volume of night crime is {peak_night_crime_location}")
The area with the largest volume of night crime is Central
Identify the number of crimes committed against victims of different age groups.¶
# Identify the number of crimes committed against victims by age groups
age_bins = [0, 17, 25, 34, 44, 54, 64, np.inf]
age_labels = ["0-17", "18-25", "26-34", "35-44", "45-54", "55-64", "65+"]
# Create a new column 'Age Bracket' based on 'Vict Age'
crimes["Age Bracket"] = pd.cut(crimes["Vict Age"], bins=age_bins, labels=age_labels, right=True)
# Find the category with the largest frequency
victim_ages = crimes["Age Bracket"].value_counts()
print(victim_ages)
Age Bracket 26-34 18987 35-44 17042 18-25 11554 45-54 11408 55-64 7996 65+ 5966 0-17 1964 Name: count, dtype: int64