Search code examples
python-3.xnumpynumpy-ndarrayarray-broadcastingnumpy-slicing

Generalising to arbitrarily sized arrays in numpy


I currently have an input grid and function which i'd like to pass onto every element of the grid (dependent on the number of dimensions of the original grid). Issue is, I currently have several different implementations of passing the function depending on the dimension and I'd like to find a better way to implement it. Below are some examples:

import numpy as np

# 1D

grid1 = np.array(
    [ 1, 2, 3]
)

# Function
foo1 = lambda x: x**2

print(grid1.shape)
print(len(grid1.shape))

f_grid1 = np.zeros_like(grid1)

for i in range(grid1.shape[0]):
    f_grid1[i] = foo1(grid1[i])

print(f_grid1)





# 2D

grid2 = np.array(
    [
        [[1, 0.1], [1, 0.2], [1, 0.3]],
        [[2, 0.1], [2, 0.2], [2, 0.3]],
        [[3, 0.1], [3, 0.2], [3, 0.3]],
        [[4, 0.1], [4, 0.2], [4, 0.3]]
    ]
)

# Function
foo2 = lambda x, y: x**2 - y


print(grid2.shape)
print(len(grid2.shape))


bins = grid2.shape[0] # bins will always the same between dimensions.

f_grid2 = np.empty(grid2.shape[:-1], dtype=np.float64)

for i in range(grid2.shape[0]):
    for j in range(grid2.shape[1]):
        f_grid2[i, j] = foo2(grid2[i, j, 0], grid2[i, j, 1])

print(f_grid2)





# 3D

grid3 = np.array(
    [
        [ [[1, 0.1, 0.0001], [1, 0.2, 0.0001], [1, 0.3, 0.0001]],
          [[2, 0.1, 0.0001], [2, 0.2, 0.0001], [2, 0.3, 0.0001]],
          [[3, 0.1, 0.0001], [3, 0.2, 0.0001], [3, 0.3, 0.0001]] ],

        [ [[1, 0.1, 0.0002], [1, 0.2, 0.0002], [1, 0.3, 0.0002]],
          [[2, 0.1, 0.0002], [2, 0.2, 0.0002], [2, 0.3, 0.0002]],
          [[3, 0.1, 0.0002], [3, 0.2, 0.0002], [3, 0.3, 0.0002]] ],

        [ [[1, 0.1, 0.0003], [1, 0.2, 0.0003], [1, 0.3, 0.0003]],
          [[2, 0.1, 0.0003], [2, 0.2, 0.0003], [2, 0.3, 0.0003]],
          [[3, 0.1, 0.0003], [3, 0.2, 0.0003], [3, 0.3, 0.0003]] ],
    ]
)

# Function
foo3 = lambda x, y, z: x**2 - y + z

bins = grid3.shape[0] # bins will always the same between dimensions.

print(grid3.shape)
print(len(grid3.shape))

f_grid3 = np.empty(grid3.shape[:-1], dtype=np.float64)

for i in range(grid3.shape[0]):
    for j in range(grid3.shape[1]):
        for k in range(grid3.shape[2]):
            f_grid3[i, j, k] = foo3(grid3[i, j, k, 0], grid3[i, j, k, 1], grid3[i, j, k, 2])

print(f_grid3)

How do I extend this to arbitrary dimensions? Output of the above would be:

(3,)
1
[1 4 9]
(4, 3, 2)
3
[[ 0.9  0.8  0.7]
 [ 3.9  3.8  3.7]
 [ 8.9  8.8  8.7]
 [15.9 15.8 15.7]]
(3, 3, 3, 3)
4
[[[0.9001 0.8001 0.7001]
  [3.9001 3.8001 3.7001]
  [8.9001 8.8001 8.7001]]

 [[0.9002 0.8002 0.7002]
  [3.9002 3.8002 3.7002]
  [8.9002 8.8002 8.7002]]

 [[0.9003 0.8003 0.7003]
  [3.9003 3.8003 3.7003]
  [8.9003 8.8003 8.7003]]]

Currently, I think the final function would look something like:

def apply_function(grid, func):
    shape = grid.shape
    if len(shape) == 1:
        grid = grid[np.newaxis].T

    out_grid = np.empty(grid.shape[:-1], dtype=np.float64)

    # calculate the corresponding out_grid here

Solution

  • As I said in comments to the similar recent question (on which, I remember that @hpaulj also commented indeed), the real question is: do you really need this? You have to be aware that, specific to a dimension or not, any method to apply a python fonction on all elements of an array will be a very very inefficient method. This is negating the whole numpy point. The whole numpy point is how NOT to do that (rule of thumb: if you have a for loop, you are probably doing it wrong)

    In your examples,

    f_grid=foo1(grid)
    f_grid=foo2(grid[:,:,0], grid[:,:,1])
    f_grid=foo3(grid[:,:,:,0], grid[:,:,:,1], grid[:,:,:,2])
    

    are the way to apply on whole vectors, without for loops (at least without explicit ones. Obviously they are replaced by implicit one inside numpy's code. But those are in C, so way faster).

    Also note that you seem to align the number of dimension of your arrays (that is the number k of dimensions of the N₁×N₂×...×Nₖ data on which you want to apply foo) and the number of variables of that foo function. There might be an applicative reason for that (we wouldn't know, that is your application). But no math or numpy reason.

    For example, a 1D grid could contains x,y,z values, like this:

    foo=lambda x,y,z: x**2+y-z. 
    

    That can be apply along a 1D array. Either with

    for i in range(len(grid)): 
        f_grid[i] =foo(grid[i][0], grid[i][1], grid[i][2]) 
    

    using your method. Or with

    f_grid=foo(grid[:,0], grid[:,1], grid[:,2]) 
    

    using mine (well, it's not mine: everybody is doing the same). So, it is not really to the number of dimension of the grid that you want to adapt your code, but to the number of variables of the foo function (aka, the size of the last axis of grid, which, other way to say the same thing, is not forced to be the number of other dimensions)

    Again, if the function is vectorized (and all your examples are; and most mathematical functions that just return a scalar from a bunch of independent scalars are), then, adapting to the number of dimensions of your grid is not a problem at all; you've nothing to do: that is already taken care of by the vectorization. Vectorized function don't care whether they are applied on a scalar, a 1D-array, a 2D-array, etc. They just repeat the jost as many time needed on all data along all axes.

    For example, a 1 parameter function foo, can be applind on all data of a 3D grid:

    foo=lambda x: x**2 
    f_grid=foo(grid) 
    

    would work whatever the dimension of grid

    So, let's forget once for all the number of dimensions of your target grid. To focus of the number of variables of your function, that is also, in more than 1D-grid, the size of the last axis. Well, let's not forget it at once, because nevertheless that 2nd problem (the fact that the K variables are along the last axis), causes a bit of the 1st problem (the code is not exactly the same depending on the number N of dimensions — again no reason to assume N=K: we need N : in the code)

    But @hpaulj gave the way to get rid of those N :. Replace my initial code for your 3 examples by

    f_grid=foo1(grid)
    f_grid=foo2(grid[...,0], grid[...,1])
    f_grid=foo3(grid[...,0], grid[...,1], grid[...,2])
    

    You see, now the number N of dimensions has no impact. Only the size K of the last axis, aka the number of parameters has. Well, except for the 1D case, which is different. We would have prefered to call it

    f_grid=foo1(grid[...,0])
    

    to stick to the same pattern. That is easily solved: make sure that in your 1D case, grid is a 2D-array (with size 1 as the last axis), exactly like in your 3D case, grid is a 4D-array (with size 3 as the last axis), or generally speaking, that in the case of a function of K parameters, generating a grid of N dimensions, that grid is a (N+1)D array with last axis of size K (that is of size N, in all your examples where K=N).

    Either because you ensure so. Or because we add a code to take care of that particular case

    grid=grid if len(grid.shape)>1 else grid[:,None]
    f_grid=foo1(grid[...,0])
    f_grid=foo2(grid[...,0], grid[...,1])
    f_grid=foo3(grid[...,0], grid[...,1], grid[...,2])
    

    The first line of this code add a phony, size-1, axis when grid is 1D, so that, as with all other examples, grid has 1 extra dimensions holding the K variables.

    Now, you can see that the common patter is starting to be obvious.

    We can generate the call for all dimensions that way

    grid=grid if len(grid.shape)>1 else grid[:,None]
    f_grid=foo(*(grid[...,i] for i in range(grid.shape[-1])))
    

    That is independent to both the number K=grid.shape[-1] of arguments, and to the number of dimensions.

    And, I know, I said "no for loop". But that one doesn't really matter: it iterates only along the number of arguments of foo. Unless you indent to have a foo function that have zillions of arguments (in which case you have a problem anyway), that for loop in negligible. It is not iterating the data. Just the dimensions of the data. It is just, 1, 2 or 3 iterations.