Search code examples
pythonpandas

How to save a pandas DataFrame table as a png


I constructed a pandas dataframe of results. This data frame acts as a table. There are MultiIndexed columns and each row represents a name, ie index=['name1','name2',...] when creating the DataFrame. I would like to display this table and save it as a png (or any graphic format really). At the moment, the closest I can get is converting it to html, but I would like a png. It looks like similar questions have been asked such as How to save the Pandas dataframe/series data as a figure?

However, the marked solution converts the dataframe into a line plot (not a table) and the other solution relies on PySide which I would like to stay away simply because I cannot pip install it on linux. I would like this code to be easily portable. I really was expecting table creation to png to be easy with python. All help is appreciated.


Solution

  • Pandas allows you to plot tables using matplotlib (details here). Usually this plots the table directly onto a plot (with axes and everything) which is not what you want. However, these can be removed first:

    import matplotlib.pyplot as plt
    import pandas as pd
    from pandas.table.plotting import table # EDIT: see deprecation warnings below
    
    ax = plt.subplot(111, frame_on=False) # no visible frame
    ax.xaxis.set_visible(False)  # hide the x axis
    ax.yaxis.set_visible(False)  # hide the y axis
    
    table(ax, df)  # where df is your data frame
    
    plt.savefig('mytable.png')
    

    The output might not be the prettiest but you can find additional arguments for the table() function here.


    EDIT:

    Here is a (admittedly quite hacky) way of simulating multi-indexes when plotting using the method above. If you have a multi-index data frame called df that looks like:

    first  second
    bar    one       1.991802
           two       0.403415
    baz    one      -1.024986
           two      -0.522366
    foo    one       0.350297
           two      -0.444106
    qux    one      -0.472536
           two       0.999393
    dtype: float64
    

    First reset the indexes so they become normal columns

    df = df.reset_index() 
    df
        first second       0
    0   bar    one  1.991802
    1   bar    two  0.403415
    2   baz    one -1.024986
    3   baz    two -0.522366
    4   foo    one  0.350297
    5   foo    two -0.444106
    6   qux    one -0.472536
    7   qux    two  0.999393
    

    Remove all duplicates from the higher order multi-index columns by setting them to an empty string (in my example I only have duplicate indexes in "first"):

    df.ix[df.duplicated('first') , 'first'] = '' # see deprecation warnings below
    df
      first second         0
    0   bar    one  1.991802
    1          two  0.403415
    2   baz    one -1.024986
    3          two -0.522366
    4   foo    one  0.350297
    5          two -0.444106
    6   qux    one -0.472536
    7          two  0.999393
    

    Change the column names over your "indexes" to the empty string

    new_cols = df.columns.values
    new_cols[:2] = '',''  # since my index columns are the two left-most on the table
    df.columns = new_cols 
    

    Now call the table function but set all the row labels in the table to the empty string (this makes sure the actual indexes of your plot are not displayed):

    table(ax, df, rowLabels=['']*df.shape[0], loc='center')
    

    et voila:

    enter image description here

    Your not-so-pretty but totally functional multi-indexed table.

    EDIT: DEPRECATION WARNINGS

    As pointed out in the comments, the import statement for table:

    from pandas.tools.plotting import table
    

    is now deprecated in newer versions of pandas in favour of:

    from pandas.plotting import table 
    

    EDIT: DEPRECATION WARNINGS 2

    The ix indexer has now been fully deprecated so we should use the loc indexer instead. Replace:

    df.ix[df.duplicated('first') , 'first'] = ''
    

    with

    df.loc[df.duplicated('first') , 'first'] = ''