Search code examples
pythonformattingnumpycode-golf

Decimal alignment formatting in Python


This should be easy.

Here's my array (rather, a method of generating representative test arrays):

>>> ri = numpy.random.randint
>>> ri2 = lambda x: ''.join(ri(0,9,x).astype('S'))
>>> a = array([float(ri2(x)+ '.' + ri2(y)) for x,y in ri(1,10,(10,2))])
>>> a
array([  7.99914000e+01,   2.08000000e+01,   3.94000000e+02,
         4.66100000e+03,   5.00000000e+00,   1.72575100e+03,
         3.91500000e+02,   1.90610000e+04,   1.16247000e+04,
         3.53920000e+02])

I want a list of strings where '\n'.join(list_o_strings) would print:

   79.9914
   20.8
  394.0
 4661.0
    5.0
 1725.751
  391.5
19061.0
11624.7
  353.92

I want to space pad to the left and the right (but no more than necessary).

I want a zero after the decimal if that is all that is after the decimal.

I do not want scientific notation.

..and I do not want to lose any significant digits. (in 353.98000000000002 the 2 is not significant)

Yeah, it's nice to want..

Python 2.5's %g, %fx.x, etc. are either befuddling me, or can't do it. I have not tried import decimal yet. I can't see that NumPy does it either (although, the array.__str__ and array.__repr__ are decimal aligned (but sometimes return scientific).

Oh, and speed counts. I'm dealing with big arrays here.

My current solution approaches are:

  1. to str(a) and parse off NumPy's brackets
  2. to str(e) each element in the array and split('.') then pad and reconstruct
  3. to a.astype('S'+str(i)) where i is the max(len(str(a))), then pad

It seems like there should be some off-the-shelf solution out there... (but not required)

Top suggestion fails with when dtype is float64:

>>> a
array([  5.50056103e+02,   6.77383566e+03,   6.01001513e+05,
         3.55425142e+08,   7.07254875e+05,   8.83174744e+02,
         8.22320510e+01,   4.25076609e+08,   6.28662635e+07,
         1.56503068e+02])
>>> ut0 = re.compile(r'(\d)0+$')
>>> thelist = [ut0.sub(r'\1', "%12f" % x) for x in a]
>>> print '\n'.join(thelist)
  550.056103
 6773.835663
601001.513
355425141.8471
707254.875038
  883.174744
   82.232051
425076608.7676
62866263.55
  156.503068

Solution

  • Sorry, but after thorough investigation I can't find any way to perform the task you require without a minimum of post-processing (to strip off the trailing zeros you don't want to see); something like:

    import re
    ut0 = re.compile(r'(\d)0+$')
    
    thelist = [ut0.sub(r'\1', "%12f" % x) for x in a]
    
    print '\n'.join(thelist)
    

    is speedy and concise, but breaks your constraint of being "off-the-shelf" -- it is, instead, a modular combination of general formatting (which almost does what you want but leaves trailing zero you want to hide) and a RE to remove undesired trailing zeros. Practically, I think it does exactly what you require, but your conditions as stated are, I believe, over-constrained.

    Edit: original question was edited to specify more significant digits, require no extra leading space beyond what's required for the largest number, and provide a new example (where my previous suggestion, above, doesn't match the desired output). The work of removing leading whitespace that's common to a bunch of strings is best performed with textwrap.dedent -- but that works on a single string (with newlines) while the required output is a list of strings. No problem, we'll just put the lines together, dedent them, and split them up again:

    import re
    import textwrap
    
    a = [  5.50056103e+02,   6.77383566e+03,   6.01001513e+05,
             3.55425142e+08,   7.07254875e+05,   8.83174744e+02,
             8.22320510e+01,   4.25076609e+08,   6.28662635e+07,
             1.56503068e+02]
    
    thelist = textwrap.dedent(
            '\n'.join(ut0.sub(r'\1', "%20f" % x) for x in a)).splitlines()
    
    print '\n'.join(thelist)
    

    emits:

          550.056103
         6773.83566
       601001.513
    355425142.0
       707254.875
          883.174744
           82.232051
    425076609.0
     62866263.5
          156.503068