I'm using NumPy's array2string
for writing an ASCII file. It out performs Python string formatting in loop or with map
:
aa = np.array2string(array.flatten(), precision=precision, separator=' ', max_line_width=(precision + 4) * ncolumns, prefix=' ', floatmode='fixed')
aa = ' ' + aa[1:-1] + '\n'
I noticed strange results when number of elements is less than a few thousand. A comparison using map
and join
performance-wise does what I expect (slower as array gets large and quicker for small arrays because of overhead of the NumPy function):
What is the cause of the spike in numpy.array2string
? It's slower for a (100, 3) array than a (500000,3) array. NumPy is the best option for the size of my data (>1000) but the spike seems weird. Full code:
import numpy as np
import perfplot
precision = 16
ncolumns = 6
# numpy method
def numpystring(array, precision, ncolumns):
indent = ' '
aa = np.array2string(array.flatten(), precision=precision, separator=' ', max_line_width=(precision + 6) * ncolumns,
prefix=' ', floatmode='fixed')
return indent + aa[1:-1] + '\n'
# native python string creation
def nativepython_string(array, precision, ncolumns):
fmt = '{' + f":.{precision}f" + '}'
data_str = ''
# calculate number of full rows
if array.size <= ncolumns:
nrows = 1
else:
nrows = int(array.size / ncolumns)
# write full rows
for row in range(nrows):
shift = row * ncolumns
data_str += ' ' + ' '.join(
map(lambda x: fmt.format(x), array.flatten()[0 + shift:ncolumns + shift])) + '\n'
# write any remaining data in last non-full row
if array.size > ncolumns and array.size % ncolumns != 0:
data_str += ' ' + ' '.join(
map(lambda x: fmt.format(x), array.flatten()[ncolumns + shift::])) + '\n'
return data_str
# Benchmark methods
out = perfplot.bench(
setup=lambda n: np.random.random([n,3]), # setup random nx3 array
kernels=[
lambda a: nativepython_string(a, precision, ncolumns),
lambda a: numpystring(a, precision, ncolumns)
],
equality_check=None,
labels=["Native", "NumPy"],
n_range=[2**k for k in range(16)],
xlabel="Number of vectors [Nr.]",
title="String Conversion Performance"
)
out.show(
time_unit="us", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
)
out.save("perf.png", transparent=True, bbox_inches="tight")
A sample of using savetxt
with small 2d array:
In [87]: np.savetxt('test.txt', np.arange(24).reshape(3,8), fmt='%5d')
In [88]: cat test.txt
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
In [90]: np.savetxt('test.txt', np.arange(24).reshape(3,8), fmt='%5d', newline=' ')
In [91]: cat test.txt
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
It constructs a fmt
string, based on the parameter and number of columns:
In [95]: fmt=' '.join(['%5d']*8)
In [96]: fmt
Out[96]: '%5d %5d %5d %5d %5d %5d %5d %5d'
and then writes this line to the file:
In [97]: fmt%tuple(np.arange(8))
Out[97]: ' 0 1 2 3 4 5 6 7'