I've got a large amount of astronomical data that I need to plot in a scatterplot. I've binned the data according to distance, and I want to plot 4 scatterplots, side by side.
For the purposes of asking this question, I've constructed a MWE based, obviously with different data, on what I've got so far:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky', 'Jim', 'Lee', 'Rob', 'Dave',
'Jane', 'Bronwyn', 'Karen', 'Liz', 'Claire', 'Chris', 'Jan', 'Ruby'],
'Age':[28,34,29,42,14,16,75,68,
27,3,2,19,17,32,71,45],
'Weight':[60,75,73,82,54,55,98,82,45,9,8,47,54,62,67,67]}
stages = ['Toddler', 'Teen', ' Young Adult', 'Adult']
ages = [0,4,20,40,100]
df = pd.DataFrame(data)
df['binned'] = pd.cut(df['Age'], bins=ages, labels=stages)
fig=plt.figure()
fig.subplots_adjust(hspace=0)
fig.subplots_adjust(wspace=0)
gridsize = 1,4
ax1 = plt.subplot2grid(gridsize, (0,0))
ax1.scatter(df['Name'], df['Weight'], alpha = 0.5)
ax1.set_ylabel('Weight, kg', fontsize=20)
ax1.set_xlabel('Name', fontsize=20)
ax2 = plt.subplot2grid(gridsize, (0,1), sharey=ax1, sharex = ax1)
plt.setp(ax2.get_yticklabels(), visible=False)
ax2.scatter(df['Name'], df['Weight'], alpha = 0.5)
ax2.set_xlabel('Name', fontsize=20)
ax3 = plt.subplot2grid(gridsize, (0,2), sharey=ax1, sharex = ax1)
plt.setp(ax3.get_yticklabels(), visible=False)
ax3.scatter(df['Name'], df['Weight'], alpha = 0.5)
ax3.set_xlabel('Name', fontsize=20)
ax4 = plt.subplot2grid(gridsize, (0,3), sharey=ax1, sharex = ax1)
plt.setp(ax4.get_yticklabels(), visible=False)
ax4.scatter(df['Name'], df['Weight'], alpha = 0.5)
ax4.set_xlabel('Name', fontsize=20)
This plots four graphs as expected: but how do I get each graph to plot only the data from one of each of the bins? In other words, how do I plot just one of the bins?
I'm not worried about the scrunching up of the names on the x axis, those are just for this MWE. They'll be numbers in my actual plots.
Just for clarification, my actual data is binned like
sources['z bins']=pd.cut(sources['z'], [0,1,2,3, max(z)],
labels = ['z < 1', '1 < z < 2', '2 < z < 3', 'z > 3'])
What if you grouped the dataframe by binned
and then plotted each group?
For example:
fig=plt.figure()
fig.subplots_adjust(hspace=0)
fig.subplots_adjust(wspace=0)
gridsize = 1,4
for i, (name, frame) in enumerate(df.groupby('binned')):
ax = plt.subplot2grid(gridsize, (0,i))
ax.scatter(frame['Name'], frame['Weight'], alpha = 0.5)
ax.set_xlabel(name, fontsize=20)
I realize you will likely want to clean up the labels a bit, but this at least puts the different bins on a different axes object.
You can iterate over a groupby object and return the name of the group and the dataframe of that group. Here I am using enumerate in order to increment the axes object
Alternatively if you do not want to use a for loop you can access each group with the get_group
method of a groupby object.
grouped = df.groupby('binned')
ax1 = plt.subplot2grid(gridsize, (0,0))
ax1.scatter(grouped.get_group('Toddler')['Name'],
grouped.get_group('Toddler')['Weight'],
alpha = 0.5)
ax1.set_ylabel('Weight, kg', fontsize=20)
ax1.set_xlabel('Name', fontsize=20)