Project Description¶
Studying abroad is an exciting experience, but it can also bring challenges. To understand these challenges better, a Japanese international university conducted a study on the mental health of its students.
In this project, we will use data manipulation skills to explore this study’s data. Our goal is to find out which factors have the strongest impact on students’ mental health when they study in a foreign country.
The university surveyed its students in 2018 and published the study in 2019. This study was carefully approved by ethical and regulatory boards to ensure proper research standards.
The results showed that international students face a higher risk of mental health issues compared to domestic students. The study also found that social connectedness (how much a student feels part of a social group) and acculturative stress (stress caused by adjusting to a new culture) are important predictors of depression.
Data Description¶
Field Name | Description |
---|---|
inter_dom |
Type of student: international or domestic |
japanese_cate |
Level of Japanese language proficiency |
english_cate |
Level of English language proficiency |
academic |
Current academic level: undergraduate or graduate |
age |
Age of the student |
stay |
Length of stay in Japan (in years) |
todep |
Total depression score based on PHQ-9 test |
tosc |
Total social connectedness score from SCS test |
toas |
Total acculturative stress score from ASISS test |
This project will help us understand how living and studying abroad affects mental health, and which factors we should pay attention to for better student support.
# import the required library
import pandas as pd
# load the dataset
df = pd.read_csv('students.csv', usecols=lambda column: column != "index")
df.head()
inter_dom | region | gender | academic | age | age_cate | stay | stay_cate | japanese | japanese_cate | ... | friends_bi | parents_bi | relative_bi | professional_bi | phone_bi | doctor_bi | religion_bi | alone_bi | others_bi | internet_bi | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Inter | SEA | Male | Grad | 24.0 | 4.0 | 5.0 | Long | 3.0 | Average | ... | Yes | Yes | No | No | No | No | No | No | No | No |
1 | Inter | SEA | Male | Grad | 28.0 | 5.0 | 1.0 | Short | 4.0 | High | ... | Yes | Yes | No | No | No | No | No | No | No | No |
2 | Inter | SEA | Male | Grad | 25.0 | 4.0 | 6.0 | Long | 4.0 | High | ... | No | No | No | No | No | No | No | No | No | No |
3 | Inter | EA | Female | Grad | 29.0 | 5.0 | 1.0 | Short | 2.0 | Low | ... | Yes | Yes | Yes | Yes | No | No | No | No | No | No |
4 | Inter | EA | Female | Grad | 28.0 | 5.0 | 1.0 | Short | 1.0 | Low | ... | Yes | Yes | No | Yes | No | Yes | Yes | No | No | No |
5 rows × 50 columns
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 286 entries, 0 to 285 Data columns (total 50 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 inter_dom 268 non-null object 1 region 268 non-null object 2 gender 268 non-null object 3 academic 268 non-null object 4 age 268 non-null float64 5 age_cate 268 non-null float64 6 stay 268 non-null float64 7 stay_cate 268 non-null object 8 japanese 268 non-null float64 9 japanese_cate 268 non-null object 10 english 268 non-null float64 11 english_cate 268 non-null object 12 intimate 260 non-null object 13 religion 268 non-null object 14 suicide 268 non-null object 15 dep 270 non-null object 16 deptype 271 non-null object 17 todep 268 non-null float64 18 depsev 273 non-null object 19 tosc 268 non-null float64 20 apd 268 non-null float64 21 ahome 268 non-null float64 22 aph 268 non-null float64 23 afear 268 non-null float64 24 acs 268 non-null float64 25 aguilt 268 non-null float64 26 amiscell 268 non-null float64 27 toas 268 non-null float64 28 partner 268 non-null float64 29 friends 268 non-null float64 30 parents 268 non-null float64 31 relative 268 non-null float64 32 profess 268 non-null float64 33 phone 268 non-null float64 34 doctor 268 non-null float64 35 reli 268 non-null float64 36 alone 268 non-null float64 37 others 268 non-null float64 38 internet 242 non-null float64 39 partner_bi 283 non-null object 40 friends_bi 283 non-null object 41 parents_bi 272 non-null object 42 relative_bi 272 non-null object 43 professional_bi 272 non-null object 44 phone_bi 272 non-null object 45 doctor_bi 272 non-null object 46 religion_bi 272 non-null object 47 alone_bi 272 non-null object 48 others_bi 272 non-null object 49 internet_bi 272 non-null object dtypes: float64(26), object(24) memory usage: 111.8+ KB
This dataset can unveil a lot of details, but we are not going to dig much deeper, we will answer a few questions¶
Let's see how the length of stay impacts the average mental health diagnostic scores of the international students present in the study.¶
# filtering the data to include the most relevant fields
stay_analysis = df[['stay', 'todep', 'tosc', 'toas']]
stay_analysis
stay | todep | tosc | toas | |
---|---|---|---|---|
0 | 5.0 | 0.0 | 34.0 | 91.0 |
1 | 1.0 | 2.0 | 48.0 | 39.0 |
2 | 6.0 | 2.0 | 41.0 | 51.0 |
3 | 1.0 | 3.0 | 37.0 | 75.0 |
4 | 1.0 | 3.0 | 37.0 | 82.0 |
... | ... | ... | ... | ... |
281 | NaN | NaN | NaN | NaN |
282 | NaN | NaN | NaN | NaN |
283 | NaN | NaN | NaN | NaN |
284 | NaN | NaN | NaN | NaN |
285 | NaN | NaN | NaN | NaN |
286 rows × 4 columns
# checking the null values in filtered data
stay_analysis.isna().sum()
stay 18 todep 18 tosc 18 toas 18 dtype: int64
# since the data was not preprocessed, we will drop the observations where the stay in 'Nan', as we are working only with 'stay'
stay_analysis = stay_analysis.dropna(subset= ['stay'])
stay_analysis = stay_analysis.astype('int')
stay_analysis
stay | todep | tosc | toas | |
---|---|---|---|---|
0 | 5 | 0 | 34 | 91 |
1 | 1 | 2 | 48 | 39 |
2 | 6 | 2 | 41 | 51 |
3 | 1 | 3 | 37 | 75 |
4 | 1 | 3 | 37 | 82 |
... | ... | ... | ... | ... |
268 | 4 | 8 | 27 | 74 |
269 | 3 | 2 | 48 | 50 |
270 | 1 | 9 | 47 | 43 |
271 | 1 | 1 | 43 | 44 |
272 | 2 | 7 | 41 | 61 |
268 rows × 4 columns
stay_analysis = stay_analysis.groupby('stay').agg(
no_of_students=('stay', 'count'),
mean_depressrion=('todep', 'mean'),
mean_social_connectedness=('tosc', 'mean'),
mean_acculturative_stress=('toas', 'mean')
).round(2).sort_index(ascending=False)
stay_analysis
no_of_students | mean_depressrion | mean_social_connectedness | mean_acculturative_stress | |
---|---|---|---|---|
stay | ||||
10 | 1 | 13.00 | 32.00 | 50.00 |
8 | 1 | 10.00 | 44.00 | 65.00 |
7 | 1 | 4.00 | 48.00 | 45.00 |
6 | 3 | 6.00 | 38.00 | 58.67 |
5 | 3 | 7.67 | 34.00 | 89.00 |
4 | 23 | 7.96 | 35.00 | 78.74 |
3 | 69 | 8.87 | 37.78 | 71.35 |
2 | 52 | 8.58 | 37.08 | 74.87 |
1 | 115 | 7.70 | 37.94 | 71.03 |
# visualize the findings to understand better
import matplotlib.pyplot as plt
stay_analysis[['mean_depressrion', 'mean_social_connectedness', 'mean_acculturative_stress']].plot(
kind='bar', figsize=(12,6), width=0.8
)
# Labels & Title
plt.xlabel('Stay')
plt.ylabel('Mean Values')
plt.title('Comparison of Depression, Social Connectedness & Acculturative Stress Across Stay')
plt.legend(title='Metrics')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)