Search code examples
pythonmatplotlibscatter-plotmarkers

Changing shape markers depend on a third string variable in matplotlib


I have a pandas dataframe called df_comunidades. Here we can see something similar to a head():

df_comunidades.head()

Tormenta    Comunidad   TIEPI   Gustmax
0   ANA ANDALUCIA   0.050   130.2
1   ANA ARAGON  0.250   90.5
2   BRUNO   ANDALUCIA   0.012   114.0
3   BRUNO   CATALUNYA   0.023   78.2
4   KARINE  ARAGON  3.500   80.2
5   ANA BALEARES    2.000   97.2

Every "Comunidad" has a different color in my scatter plot, but furthermore, I want that every "Tormenta" has a different shape marker. I tried many ways... one of them similar to the method I used for colors. I tried also with a loop for i in range(len(markers)): where all the markers are saved in a list markers=['o','v','<','>','1','8','s','*','x','d'] My stable code is:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# initialize list of lists
data = [['ANA', 'ANDALUCIA', 0.05, 130.2], ['ANA', 'ARAGON', 0.25, 90.5], ['BRUNO', 'ANDALUCIA', 0.012, 114], ['BRUNO', 'CATALUNYA', 0.023, 78.2],['KARINE', 'ARAGON', 3.5, 80.2], ['ANA', 'BALEARES', 2, 97.2]]
 
# Create the pandas DataFrame
df_comunidades = pd.DataFrame(data, columns = ['Tormenta', 'Comunidad', 'TIEPI', 'Gustmax'])

#I define every color for every "Comunidad"
colors = {'ANDALUCIA' : 'g',
          'CATALUNYA' : 'y',
          'BALEARES' : 'r', 
          'ARAGON' : 'c'}
c = [colors[comunid] for comunid in df_comunidades['Comunidad']]

plt.scatter(df_comunidades['TIEPI'], df_comunidades['Gustmax'], alpha=0.5, c=c)

ax = plt.subplot(1, 1, 1)
#code to title the axes and the plot: 
ax.set_xlim([0,3])
ax.set_xlabel("TIEPI")
ax.set_ylabel("Max Gusts in community")
plt.title("Relation between max gusts and TIEPI in autonomous communities")
plt.savefig('max_tiepi-gusts_comunid.png',dpi=300)

I got this... it seems like the square is above the rest... but every point in the scatter is supposed to be a "Tormenta" and the colour indicates "Comunidad". enter image description here

With the whole data the appearance would be like this: enter image description here

Edited after comments in order to be more clear


Solution

  • enter image description here I think you're just lamenting that scatter() takes a sequence of colors, yet just a single marker. So we will need to loop over the N points. (Or we could .groupby() if we wanted make just T calls for T tormentas.)

    There seems to be a discrepancy between the "MIN gusts" label and the "GustMAX" column.

    There are many markers to choose from. You might try 'v', 's', 'p' to go through a progression of 3-, 4-, 5-sided marks.

    I made these changes to produce the enclosed chart.

    --- a/tmp/so_69076918_orig.py
    +++ b/tmp/so_69076918.py
    @@ -14,14 +14,22 @@ colors = {'ANDALUCIA' : 'g',
               'CATALUNYA' : 'y',
               'BALEARES' : 'r', 
               'ARAGON' : 'c'}
    +markers = {'ANA': 'v',
    +           'BRUNO': 'x',
    +           'KARINE': 'd'}
     c = [colors[comunid] for comunid in df_comunidades['Comunidad']]
     
    -plt.scatter(df_comunidades['TIEPI'], df_comunidades['Gustmax'], alpha=0.5, c=c)
    -
     ax = plt.subplot(1, 1, 1)
     #code to title the axes and the plot: 
    -ax.set_xlim([0,3])
    +ax.set_xlim([-.1, 4])
     ax.set_xlabel("TIEPI")
    +ax.set_ylim([0, 140])
     ax.set_ylabel("Min Gusts in community")
     plt.title("Relation between min gusts and TIEPI in autonomous communities")
    +
    +for row in df_comunidades.itertuples():
    +    plt.scatter([row.TIEPI], [row.Gustmax], alpha=0.5,
    +                c=colors[row.Comunidad],
    +                marker=markers[row.Tormenta])
    +
     plt.savefig('min_tiepi-gusts_comunid.png',dpi=300)