Search code examples
pythonlistcountscatteritems

Python: How to find the number of items in each point on scatterplot and produce list?


Right now I have a dataset of 1206 participants who have each endorsed a certain number of traumatic experiences and a number of symptoms associated with the trauma.

This is part of my dataframe (full dataframe is 1206 rows long):

SubjectID PTSD_Symptom_Sum PTSD_Trauma_Sum
1223 3 5
1224 4 2
1225 2 6
1226 0 3

I have two issues that I am trying to figure out:

  1. I was able to create a scatter plot, but I can't tell from this plot how many participants are in each data point. Is there any easy way to see the number of subjects in each data point?

I used this code to create the scatterplot:

plt.scatter(PTSD['PTSD_Symptom_SUM'], PTSD['PTSD_Trauma_SUM'])
plt.title('Trauma Sum vs. Symptoms')
plt.xlabel('Symptoms')
plt.ylabel('Trauma Sum')

Scatterplot of Trauma Sum by number of symptoms

  1. I haven't been able to successfully produce a list of the number of people endorsing each pair of items (symptoms and trauma number). I am able to run this code to create the counts for the number of people in each category: :
count_sum= PTSD['PTSD_SUM'].value_counts()
count_symptom_sum= PTSD['PTSD_symptom_SUM'].value_counts()

print(count_sum)
print(count_symptom_sum)

Which produces this output:

0    379
1    371
2    248
3    130
4     47
5     17
6     11
8      2
7      1
Name: PTSD_SUM, dtype: int64
0    437
1    418
2    247
3     74
4     23
5      4
6      3
Name: PTSD_symptom_SUM, dtype: int64

Is it possible to alter the code to count the number of people endorsing each pair of items (symptom number and trauma number)? If not, are there any functions that would allow me to do this?


Solution

  • You could create a new dataset with the counts of each pair 'PTSD_SUM', 'PTSD_Symptom_SUM' with:

    counts = PTSD.groupby(by=['PTSD_symptom_SUM', 'PTSD_SUM']).size().to_frame('size').reset_index()
    

    and then use Seaborn like this:

    import seaborn as sns
    sns.scatterplot(data=counts, x="PTSD_symptom_SUM", y="PTSD_SUM", hue="size", size="size")
    

    To obtain something like this:

    enter image description here