Search code examples
pythonmatplotlibplotformattingaxes

Python matplotlib scientific axis formating


I've edited my question, I believe it is more didactic that way,

I'm plotting a chart using matplotlib and I'm facing issues with the formatting of the axes. I can't figure out how to force him to use the same scientific formatting all the time : In the bellow example, e4 (instead of e4 and e2). Also I would like to have always two decimals - Any idea ? the doc on that is not very extensive.

Creating a random df of data :

import numpy as np
import matplotlib.pyplot as plt
from pandas.stats.api import ols
import pandas as pd

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(100000)
y = x *100 + (np.random.randn()*100)

Calculating the linear regression :

df = pd.DataFrame({'x':x,'y':y})
res = ols(y=df['y'], x=df['x'])
df['yhat'] = df['x']*res.beta[0] + res.beta[1]

Plotting :

plt.scatter(df['x'], df['y'])  
plt.plot(df['x'], df['yhat'], color='red') 
plt.title('Scatter graph with linear regression')              
plt.xlabel('X')
plt.ylabel('Y')
plt.ticklabel_format(style='sci', scilimits=(0,0))
plt.ylim(0)
plt.xlim(0)

Please find the output here


Solution

  • As far as I can tell, matplotlib does not offer exactly this options out of the box. The documentation is indeed sparse (Ticker API is the place to go). The Formatter classes are responsible for formatting the tick values. Out of the ones offered, only ScalarFormatter (the default formatter) offers scientific formatting, however, it does not allow the exponent or number of significant digits to be fixed. One alternative would be to use either FixedFormatter or FuncFormatter, which essentially allow you to freely choose the tick values (the former can be indirectly selected using plt.gca().set_xticklabels). However, none of them allow you to choose the so called offset_string which is the string displayed at the end of the axis, customary used for a value offset, but ScalarFormatter also uses it for the scientific multiplier.

    Thus, my best solution consists of a custom formatter derived from ScalarFormatter, where instead of autodetecting order of magnitude and format string, those are just fixed by the used:

    from matplotlib import rcParams
    import matplotlib.ticker
    
    if 'axes.formatter.useoffset' in rcParams:
        # None triggers use of the rcParams value
        useoffsetdefault = None
    else:
        # None would raise an exception
        useoffsetdefault = True
    
    class FixedScalarFormatter(matplotlib.ticker.ScalarFormatter):
        def __init__(self, format, orderOfMagnitude=0, useOffset=useoffsetdefault, useMathText=None, useLocale=None):
            super(FixedScalarFormatter,self).__init__(useOffset=useOffset,useMathText=useMathText,useLocale=useLocale)
            self.base_format = format
            self.orderOfMagnitude = orderOfMagnitude
    
        def _set_orderOfMagnitude(self, range):
            """ Set orderOfMagnitude to best describe the specified data range.
    
            Does nothing except from preventing the parent class to do something.
            """
            pass
    
        def _set_format(self, vmin, vmax):
            """ Calculates the most appropriate format string for the range (vmin, vmax).
    
            We're actually just using a fixed format string.
            """
            self.format = self.base_format
            if self._usetex:
                self.format = '$%s$' % self.format
            elif self._useMathText:
                self.format = '$\mathdefault{%s}$' % self.format   
    

    Note that the default value of ScalarFormatter's constructor parameter useOffset changed at some point, mine tries to guess which one is the right one.

    Attach this class to one or both axes of your plots as follows:

    plt.gca().xaxis.set_major_formatter(FixedScalarFormatter('%.2f',4))
    plt.gca().yaxis.set_major_formatter(FixedScalarFormatter('%.2f',4))