Search code examples
pythonjsonnumpypretty-print

Pretty print and substitute numpy arrays


Imagine am pretty printing a JSON-decoded object that has the structure:

{Stack
   Layer
     Material
       <array>
}

Where these variables correspond to a hierarchy of structures in a simulation. At the end of the hierarchy, there's a numpy array object. These arrays tend to be really large, so I'd rather not print them explicitly, but kind of print a summary. So instead of:

{Stack
   Layer
     Material
       array([1,2,3,4,5......])
}

It would look like:

{Stack
   Layer
     Material
       array: dtype, shape
}

IE pretty print would not print the full array, but just summarize it's information by printing the shape and datatype. Is such customization available in prettyprint?


Solution

  • Ok, I'm a molecular biologist and not a professional programmer, so bear with me.
    In my very naïf opinion, which you shouldn't take into much consideration, one of the options is to make your own version of pprint, aware of numpy's ndarray objects, and printing them the way you want them.

    What I did, and it worked with me, is to open the pprint module (under the Lib directory) and create a modified copy, like this:

    (I pasted the working, modified code on pastebin, you can find it here )

    First, in the import section, have it try to import numpy's ndarray, by adding:

    try:
        from numpy import ndarray
        np_arrays = True
    except ImportError:
        np_arrays = False
    

    Then, in the definition of the _format function, right after this:

    # This is already there
    if self._depth and level > self._depth:
        write(rep)
        return
    

    (so at line 154 in my copy, after the imports) you should add:

    # Format numpy.ndarray object representations
    if np_arrays and issubclass(typ, ndarray):
        write('array(dtype:' + str(object.dtype))
        write('; shape: ' + str(object.shape) + ')')
        return
    

    (then the function continues with, r = getattr(typ, "__repr__", None)...)

    Now save this script in the same Lib directory where pprint is, with a new name like i.e. mypprint.py and then try:

    from mypprint import pprint
    pprint.pprint(object_with_np.arrays)