python arrays numpy multidimensional-array numpy-ndarray

Python empty numpy 2D array and append value

Problem

I wanna make size undefined 2D empty numpy array and append values.

my try

import numpy as np
import randrom

unknown = random.randint(2, 666)

#arr = np.array([np.array([])])
#arr = np.empty((unknown, 0), int)


for ch in range (unknown):
    some_input = random.randint(1, 666)
    #arr[ch] = np.append((arr[ch], some_input))
    #arr[ch] = np.concatenate((arr[ch], some_input))
    #arr = np.append((arr, some_input), axis=ch)
    #arr = np.concatenate((arr, some_input), axis=ch)

none of them works.

You may suggest make axis-wise arrays and combined them into single np array but I cannot do that.

the values must be appended by one by one.

if the unknown value became big and multiple for loops exist outside our for loop then the computer memory cannot handle it.

Solution

First try a straight forward list append approach to building an array. This does one array build at the end:

In [82]: res=[]
    ...: for i in range(10):
    ...:     res.append(np.arange(20))
    ...: arr = np.array(res)

In [83]: arr.shape
Out[83]: (10, 20)

In [84]: %%timeit
    ...: res=[]
    ...: for i in range(10):
    ...:     res.append(np.arange(20))
    ...: arr = np.array(res)
29.7 µs ± 73.4 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Now try a numpy approach that tries to immitate the list, making an "empty" start array, and copying it to a new one with each new row. Note that I have to match the dimensions carefully. There's less room for being sloppy with this approach:

In [85]: res = np.zeros((0,20),int)
    ...: for i in range(10):
    ...:     res = np.concatenate((res, np.arange(20)[None,:]), axis=0)
    ...:     

In [86]: res.shape
Out[86]: (10, 20)

In [87]: %%timeit
    ...: res = np.zeros((0,20),int)
    ...: for i in range(10):
    ...:     res = np.concatenate((res, np.arange(20)[None,:]), axis=0)
119 µs ± 398 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

It's quite a bit slower. Each concatenate (or np.append) requires a full array copy. Lists are designed to grow, arrays are not.

Each concatenate joins a (n,20) array and a (1,20) to make a new (n+1, 20) array.

The array clone is harder to get right, and slower. Don't do it.

An alternative is to start off with a (10,20) shape array, and assign each row. Better yet use a mix of whole-array methods to make the array without any itertions.

A variation on the list approach, using a list comprehension to make a list of arrays, and one concatenate call to join them into one:

In [88]: np.concatenate([np.arange(20)[None,:] for _ in range(10)], axis=0).shape
Out[88]: (10, 20)

In [89]: timeit np.concatenate([np.arange(20)[None,:] for _ in range(10)], axis=0).shape
43.7 µs ± 180 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)