Search code examples
pythondatetimematplotlib

matplotlib: How to use marker size / color as an extra dimension in plots?


I am plotting a time series where x is a series of datetime.datetime objects and y is a series of doubles.

I'd like to map the marker size to a third series z (and possibly also map marker color to a fourth series w), which in most cases could be accomplished with:

scatter(x, y, s=z, c=w)

except scatter() does not permit x being a series of datetime.datetime objects.

plot(x, y, marker='o', linestyle='None')

on the other hand works with x being datetime.datetime (with properly tick label), but markersize/color can only be set for all points at once, namely no way to map them to extra series.

Seeing that scatter and plot each can do half of what I need, is there a way to do both?

UPDATE following @tcaswell's question, I realized scatter raised an KeyError deep in the default_units() in matplotlib/dates.py on the line:

x = x[0]

and sure enough my x and y are both Series taken from a pandas DataFrame which has no '0' in index. I then tried two things (both feel somewhat hacky):

First, I tried modify the DataFrame index to 0..len(x), which led to a different error inside matplotlib/axes/_axes.py at:

offsets  = np.dstack((x,y))

dstack doesn't play nice with pandas Series. So I then tried convert x and y to numpy.array:

scatter(numpy.array(x), numpy.array(y), s=numpy.array(z))

This almost worked except scatter seemed to have trouble auto-scaling x axis and collapsed everything into a straight line, so I have to reset xlim explicitly to see the plot.

All of this is to say that scatter could do the job albeit with a bit of convolution. I had always thought matplotlib can take any array-like inputs but apparently that's not quite true if the data is not simple numbers that require some internal gymnastics.

UPDATE2 I also tried to follow @user3666197's suggestion (thanks for the editing tips btw). If I understood correctly, I first converted x into a series of 'matplotlib style days':

mx = mPlotDATEs.date2num(list(x))

which then allows me to directly call:

scatter(mx, y, s=z)

then to label axis properly, I call:

gca().xaxis.set_major_formatter( DateFormatter('%Y-%m-%d %H:%M'))

(call show() to update the axis label if interactive mode)

It worked quite nicely and feels to me a more 'proper' way of doing things, so I'm going to accept that as the best answer.


Solution

  • Is there a way to do both? Yes.

    However, let's work by example:

    enter image description here enter image description here

    step A: from a datetime to a matplotlib convention-compatible float for dates/times
    step B: adding 3D | 4D | 5D capabilities ( using additional { color | size | alpha } --coded dimensionality of information )


    As usual, devil is hidden in detail.

    matplotlib dates are almost equal, but not equal:

    #  mPlotDATEs.date2num.__doc__
    #                  
    #     *d* is either a class `datetime` instance or a sequence of datetimes.
    #
    #     Return value is a floating point number (or sequence of floats)
    #     which gives the number of days (fraction part represents hours,
    #     minutes, seconds) since 0001-01-01 00:00:00 UTC, *plus* *one*.
    #     The addition of one here is a historical artifact.  Also, note
    #     that the Gregorian calendar is assumed; this is not universal
    #     practice.  For details, see the module docstring.
    

    So, highly recommended to re-use their "own" tool:

    from matplotlib import dates as mPlotDATEs   # helper functions num2date()
    #                                            #              and date2num()
    #                                            #              to convert to/from.
    

    Managing axis-labels & formatting & scale (min/max) is a separate issue

    Nevertheless, matplotlib brings you arms for this part too:

    from matplotlib.dates   import  DateFormatter,    \
                                    AutoDateLocator,   \
                                    HourLocator,        \
                                    MinuteLocator,       \
                                    epoch2num
    from matplotlib.ticker  import  ScalarFormatter, FuncFormatter
    

    and may for example do:

        aPlotAX.set_xlim( x_min, x_MAX )               # X-AXIS LIMITs ------------------------------------------------------------------------------- X-LIMITs
        
        #lt.gca().xaxis.set_major_locator(      matplotlib.ticker.FixedLocator(  secs ) )
        #lt.gca().xaxis.set_major_formatter(    matplotlib.ticker.FuncFormatter( lambda pos, _: time.strftime( "%d-%m-%Y %H:%M:%S", time.localtime( pos ) ) ) )
        
        aPlotAX.xaxis.set_major_locator(   AutoDateLocator() )
        
        aPlotAX.xaxis.set_major_formatter( DateFormatter( '%Y-%m-%d %H:%M' ) )  # ----------------------------------------------------------------------------------------- X-FORMAT
    
        #--------------------------------------------- # 90-deg x-tick-LABELs
    
        plt.setp( plt.gca().get_xticklabels(),  rotation            = 90,
                                                horizontalalignment = 'right'
                                                )
        
        #------------------------------------------------------------------
    

    Adding { 3D | 4D | 5D } transcoding

    Just to imagine the approach, check this example, additional dimensionality of information was coded using different tools into { color | size | alpha }. Whereas { size | alpha } are scatter-point related, for color there are additional tools in matplotlib included a set of colouring scaled for various domain-specific or human-eye vision / perception adapted colour-scales. A nice explanation of color-scale / normalisation scaler is presented here.

    enter image description here

    You may have noticed, that this 4D example still has a constant alpha ( unused for 5th DOF in true 5D dimensionality visualisation ).