Project Description¶

Los Angeles, California, is a city that attracts people from all over the world. It offers many opportunities, but not all of them are positive. In this project, we will explore crime data to understand when and where crimes happen most often in Los Angeles. We will also look at the types of crimes that are commonly committed in the city.

The Data¶

The dataset for this project has been kindly provided by the City of Los Angeles. The original data is updated every two months and can be found here.

The dataset we are using is a modified version of the original data, which is publicly available from Los Angeles Open Data.

`crimes.csv`¶

Column	Description
`'DR_NO'`	Division of Records Number: This is the official file number. It includes a 2-digit year, area ID, and 5 digits.
`'Date Rptd'`	The date the crime was reported, in MM/DD/YYYY format.
`'DATE OCC'`	The date the crime actually happened, in MM/DD/YYYY format.
`'TIME OCC'`	The time when the crime occurred, given in 24-hour military time.
`'AREA NAME'`	The name of one of the 21 geographic areas or patrol divisions. Each area is named after a landmark or community it serves. For example, the 77th Street Division covers neighborhoods near South Broadway and 77th Street in South Los Angeles.
`'Crm Cd Desc'`	A description of the type of crime committed.
`'Vict Age'`	The victim’s age in years.
`'Vict Sex'`	The victim’s sex: `F` for Female, `M` for Male, and `X` for Unknown.
`'Vict Descent'`	The victim’s ethnic background. Possible values include: - `A` - Other Asian - `B` - Black - `C` - Chinese - `D` - Cambodian - `F` - Filipino - `G` - Guamanian - `H` - Hispanic/Latin/Mexican - `I` - American Indian/Alaskan Native - `J` - Japanese - `K` - Korean - `L` - Laotian - `O` - Other - `P` - Pacific Islander - `S` - Samoan - `U` - Hawaiian - `V` - Vietnamese - `W` - White - `X` - Unknown - `Z` - Asian Indian
`'Weapon Desc'`	Description of the weapon used in the crime, if any.
`'Status Desc'`	The current status of the crime case.
`'LOCATION'`	The street address where the crime happened.

By analyzing this data, we aim to find patterns and trends in crime across Los Angeles. This can help improve understanding of crime hotspots and times, and support efforts to make the city safer.

Import required libraries¶

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns

In [5]:

# load the dataset
crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str}, usecols=lambda column: column != "index")
crimes.head()

Out[5]:

	DR_NO	Date Rptd	DATE OCC	TIME OCC	AREA NAME	Crm Cd Desc	Vict Age	Vict Sex	Vict Descent	Weapon Desc	Status Desc	LOCATION
0	220314085	2022-07-22	2020-05-12	1110	Southwest	THEFT OF IDENTITY	27	F	B	NaN	Invest Cont	2500 S SYCAMORE AV
1	222013040	2022-08-06	2020-06-04	1620	Olympic	THEFT OF IDENTITY	60	M	H	NaN	Invest Cont	3300 SAN MARINO ST
2	220614831	2022-08-18	2020-08-17	1200	Hollywood	THEFT OF IDENTITY	28	M	H	NaN	Invest Cont	1900 TRANSIENT
3	231207725	2023-02-27	2020-01-27	635	77th Street	THEFT OF IDENTITY	37	M	H	NaN	Invest Cont	6200 4TH AV
4	220213256	2022-07-14	2020-07-14	900	Rampart	THEFT OF IDENTITY	79	M	B	NaN	Invest Cont	1200 W 7TH ST

We will explore the crimes.csv dataset and use our findings to answer the following questions:¶

Which hour has the highest frequency of crimes?¶

In [6]:

# Extract the first two digits from "TIME OCC", representing the hour and convert to integer data type
crimes["HOUR OCC"] = crimes["TIME OCC"].str[:2].astype(int)

# Keeping only valid hour values (0-23)
crimes = crimes[crimes["HOUR OCC"].between(0, 23)]

# using countplot to find the largest frequency
plt.figure(figsize=(8, 6))
sns.countplot(data=crimes, x="HOUR OCC")
plt.xticks(rotation=45)  
plt.tight_layout()
plt.show()

No description has been provided for this image

In [3]:

# Midday has the largest volume of crime
peak_crime_hour = 12

Which area has the largest frequency of night crimes (crimes committed between 10pm and 3:59am)?¶

In [7]:

# Filter for the night-time hours, 0 = midnight; 3 = crimes between 3am and 3:59am, i.e., don't include 4
night_crimes = crimes[crimes["HOUR OCC"].isin([22,23,0,1,2,3])]

In [12]:

night_crimes = night_crimes.groupby("AREA NAME", as_index=False)["HOUR OCC"].count().sort_values("HOUR OCC", ascending=False)
night_crimes

Out[12]:

	AREA NAME	HOUR OCC
1	Central	921
6	Hollywood	718
0	77th Street	711
15	Southwest	661
11	Olympic	597
14	Southeast	585
9	Newton	577
12	Pacific	534
8	N Hollywood	513
17	Van Nuys	493
10	Northeast	485
20	Wilshire	476
13	Rampart	469
16	Topanga	458
19	West Valley	436
7	Mission	407
2	Devonshire	390
4	Harbor	389
3	Foothill	378
18	West LA	363
5	Hollenbeck	342

In [14]:

peak_night_crime_location = night_crimes.iloc[0]["AREA NAME"]
# Print the peak night crime location
print(f"The area with the largest volume of night crime is {peak_night_crime_location}")

The area with the largest volume of night crime is Central

Identify the number of crimes committed against victims of different age groups.¶

In [15]:

# Identify the number of crimes committed against victims by age groups 
age_bins = [0, 17, 25, 34, 44, 54, 64, np.inf]
age_labels = ["0-17", "18-25", "26-34", "35-44", "45-54", "55-64", "65+"]

# Create a new column 'Age Bracket' based on 'Vict Age'
crimes["Age Bracket"] = pd.cut(crimes["Vict Age"], bins=age_bins, labels=age_labels, right=True)

In [17]:

# Find the category with the largest frequency
victim_ages = crimes["Age Bracket"].value_counts()
print(victim_ages)

Age Bracket
26-34    18987
35-44    17042
18-25    11554
45-54    11408
55-64     7996
65+       5966
0-17      1964
Name: count, dtype: int64

We can analyze the dataset further and answer many more questions. For example, what type of crimes were mostly committed, when, where and which age group and gender was victim! Thank you.¶