Search code examples
pythonpandascdf

pandas plot CDF for multi-class column


I am working with the python empyrical-dist package to plot a CDF of speed distribution with respect to traval mode (multi-class).

data.head()
+---+---------+----------+----------+-------+--------------+------------+
|   | trip_id | distance | duration | speed | acceleration | travelmode |
+---+---------+----------+----------+-------+--------------+------------+
| 0 |  303637 | 5.92     | 0.51     | 3.20  | 0.00173      | metro      |
| 1 |  303638 | 3.54     | 0.22     | 4.44  | 0.00557      | bus        |
| 2 |  303642 | 4.96     | 0.20     | 6.84  | 0.00944      | car        |
| 3 |  303662 | 6.53     | 0.97     | 1.86  | 0.00053      | foot       |
| 4 |  303663 | 40.23    | 0.94     | 11.85 | 0.00349      | car        |
+---+---------+----------+----------+-------+--------------+------------+

now what to plot the CDF of speed column for each mode in travelmode. So,

from empiricaldist import Cdf

def decorate_cdf(title, x, y):
    """Labels the axes.

    title: string
    """
    plt.xlabel(x)
    plt.ylabel(y)
    plt.title(title)

for name, group in data.groupby('travelmode'):
    Cdf.from_seq(group.speed).plot()

title, x, y = 'Speed by mode','speed (km/h)', 'CDF'
decorate_cdf(title,x,y)

enter image description here

How do I then add legend to each plot so I can tell which plot is for what mode?


Solution

  • Use matplotlib's pyplot.legend command:

    plt.legend(data.groupby('travelmode').groups.keys())