Search code examples
pythonnumpypython-zip

ascii_graph works on primitive literals but not same(?) array from numpy.arange & zip


I am trialing the ascii_graph package for Python. If I assemble the histogram data using numpy.arange and zip, the plotting fails. If I assemble the data from primitive literals, it succeeds. Can anyone please explain what the difference is?

import numpy as np
BinMid = np.arange(20) + 1 # Bin mid-points
BinEdge = np.arange(21) + 0.5
   # Bin edges, used only in generating histogram
   # counts (not shown in this sample code)
nDist = np.array( # Bin counts
   [  7083,    73485,   659204,  3511238, 10859771, 22162510,
      34511661, 45891902, 55651178, 59153091, 56242073,
      48598282, 37947325, 27541907, 19356046, 13630601,
      8810979, 4262462, 1227506,   216751], dtype=np.int64 )

# Histogram data
histData = list( zip( BinMid.astype(str) , nDist ) )
#   [('1', 7083),
#    ('2', 73485),
#    ('3', 659204),
#    ('4', 3511238),
#    ('5', 10859771),
#    ('6', 22162510),
#    ('7', 34511661),
#    ('8', 45891902),
#    ('9', 55651178),
#    ('10', 59153091),
#    ('11', 56242073),
#    ('12', 48598282),
#    ('13', 37947325),
#    ('14', 27541907),
#    ('15', 19356046),
#    ('16', 13630601),
#    ('17', 8810979),
#    ('18', 4262462),
#    ('19', 1227506),
#    ('20', 216751)]

# Create ASCII histograph plotter
from ascii_graph import Pyasciigraph
graph = Pyasciigraph()

# FAILS: Plot using zip expression assigned to histData
#------------------------------------------------------
for line in graph.graph( "Test" ,
      list( zip( BinMid.astype(str) , nDist ) ) ):
   print(line)
for line in graph.graph( "Test" , histData ): print(line)
#  Traceback (most recent call last):
#    Cell In[139], line 1
#      for line in graph.graph( "Test" , histData ): print(line)
#    File ~\AppData\Local\anaconda3\envs\py39\lib\site-packages\ascii_graph\__init__.py:399 in graph
#      san_data = self._sanitize_data(data)
#    File ~\AppData\Local\anaconda3\envs\py39\lib\site-packages\ascii_graph\__init__.py:378 in _sanitize_data
#      (self._sanitize_string(item[0]),
#    File ~\AppData\Local\anaconda3\envs\py39\lib\site-packages\ascii_graph\__init__.py:351 in _sanitize_string
#      return info
#  UnboundLocalError: local variable 'info' referenced before assignment

# SUCCEEDS: Assign pimitive literals to histData
#-----------------------------------------------
histData = [ ('1', 7083),
             ('2', 73485),
             ('3', 659204),
             ('4', 3511238),
             ('5', 10859771),
             ('6', 22162510),
             ('7', 34511661),
             ('8', 45891902),
             ('9', 55651178),
             ('10', 59153091),
             ('11', 56242073),
             ('12', 48598282),
             ('13', 37947325),
             ('14', 27541907),
             ('15', 19356046),
             ('16', 13630601),
             ('17', 8810979),
             ('18', 4262462),
             ('19', 1227506),
             ('20', 216751) ]
for line in graph.graph( "Test" , histData ): print(line)
# Test
# ###############################################################################
#                                                                        7083  1
#                                                                       73485  2
#                                                                      659204  3
# ███                                                                 3511238  4
# ███████████                                                        10859771  5
# ████████████████████████                                           22162510  6
# █████████████████████████████████████                              34511661  7
# ██████████████████████████████████████████████████                 45891902  8
# █████████████████████████████████████████████████████████████      55651178  9
# █████████████████████████████████████████████████████████████████  59153091  10
# █████████████████████████████████████████████████████████████      56242073  11
# █████████████████████████████████████████████████████              48598282  12
# █████████████████████████████████████████                          37947325  13
# ██████████████████████████████                                     27541907  14
# █████████████████████                                              19356046  15
# ██████████████                                                     13630601  16
# █████████                                                           8810979  17
# ████                                                                4262462  18
# █                                                                   1227506  19
#                                                                      216751  20

Afternote

Based on Nick ODell's response, the following works:

import numpy as np
BinMidStr = [ str(i+1) for i in range(20) ] # Bin edges
nDist = np.array( # Bin counts
   [  7083,    73485,   659204,  3511238, 10859771, 22162510,
      34511661, 45891902, 55651178, 59153091, 56242073,
      48598282, 37947325, 27541907, 19356046, 13630601,
      8810979, 4262462, 1227506,   216751], dtype=np.int64 )

# Histogram data
histData = list( zip( BinMidStr , nDist ) )

# Create ASCII histograph plotter
from ascii_graph import Pyasciigraph
graph = Pyasciigraph()

# Plot code pattern #1
for line in graph.graph( "Test" ,
                          list( zip( BinMidStr , nDist ) ) ):
   print(line)

# Plot code pattern #2
for line in graph.graph( "Test" , histData ): print(line)

# Plot code pattern #3 for when labels are in integer form
BinMid = [ i+1 for i in range(20) ] # Bin edges
BinMidStr = [ str(i) for i in BinMid ]
for line in graph.graph( "Test" ,
                          list( zip( BinMidStr , nDist ) ) ):
   print(line)

If you work a lot in NumPy and have you bin labels in the form if NumPy integers, be aware that the following almost looks like it creates native (non-NumPy) Python string labels, but it actually creates one string representing the entire array should be displayed:

# Plot code pattern #4 (nonfunctional) for when labels are in
# NumPy integer form
BinMid = np.arange(20) + 1  # Bin edges
BinMidStr = np.array_str( BinMid )
   # '[ 1  2  3  4  5  6  7  8  9 10 11
   #    12 13 14 15 16 17 18 19 20]'
BinMidStr = np.array_str( BinMid.astype('str') )
   # "['1' '2' '3' '4' '5' '6' '7' '8' '9' '10' '11' '12'
   # '13' '14' '15' '16'\n '17' '18' '19' '20']"

I find it odd that Pyasciigraph.graph() accepts an array of NumPy datatype for the numerical bar sizes, but not Nump strings for the bar labels. Another thing I am puzzled by is the lack of a function prototype for the Pyasciigraph.graph() method. While I still consider myself new to Python, most packages I've used provide Python-like documentation with function prototypes and explanations of the input and output arguments.

I wish there were standard streamlined ways to convert between arrays of native Python and NumPy data types. Going from NumPy to native Python seems trickier, as there are probably fewer cases in which people want that. Afternote: Based on this Q&A, it seems that MyNParray.tolist() is the standard streamline idiom to convert NumPy array of NumPy data types to a native Python array of Python data types. It is even better than [ Element.item() for Element in MyNParray ]. The latter doesn't work on a NumPy array of NumPy strings.


Solution

  • It looks like the types of those strings ends up being numpy.str_ rather than str.

    >>> histData = list( zip( BinMid.astype(str) , nDist ) )
    >>> print(type(histData[0][0]))
    <class 'numpy.str_'>
    

    In comparison to the literal definition:

    >>> histData2 = [('1', 7083), ('2', 73485)]
    >>> print(type(histData2[0][0]))
    <class 'str'>
    

    I would suggest using an approach that gives you str objects.

    histData = list(zip(map(str, BinMid), nDist))
    

    With this, I am able to make the graph from the NumPy array.