Right now I have a dataset of 1206 participants who have each endorsed a certain number of traumatic experiences and a number of symptoms associated with the trauma.
This is part of my dataframe (full dataframe is 1206 rows long):
SubjectID | PTSD_Symptom_Sum | PTSD_Trauma_Sum |
---|---|---|
1223 | 3 | 5 |
1224 | 4 | 2 |
1225 | 2 | 6 |
1226 | 0 | 3 |
I have two issues that I am trying to figure out:
I used this code to create the scatterplot:
plt.scatter(PTSD['PTSD_Symptom_SUM'], PTSD['PTSD_Trauma_SUM'])
plt.title('Trauma Sum vs. Symptoms')
plt.xlabel('Symptoms')
plt.ylabel('Trauma Sum')
count_sum= PTSD['PTSD_SUM'].value_counts()
count_symptom_sum= PTSD['PTSD_symptom_SUM'].value_counts()
print(count_sum)
print(count_symptom_sum)
Which produces this output:
0 379
1 371
2 248
3 130
4 47
5 17
6 11
8 2
7 1
Name: PTSD_SUM, dtype: int64
0 437
1 418
2 247
3 74
4 23
5 4
6 3
Name: PTSD_symptom_SUM, dtype: int64
Is it possible to alter the code to count the number of people endorsing each pair of items (symptom number and trauma number)? If not, are there any functions that would allow me to do this?
You could create a new dataset with the counts of each pair 'PTSD_SUM', 'PTSD_Symptom_SUM'
with:
counts = PTSD.groupby(by=['PTSD_symptom_SUM', 'PTSD_SUM']).size().to_frame('size').reset_index()
and then use Seaborn like this:
import seaborn as sns
sns.scatterplot(data=counts, x="PTSD_symptom_SUM", y="PTSD_SUM", hue="size", size="size")
To obtain something like this: