Search code examples
pythonawkward-array

awkward1; how to set array dimension as variable?


So aim is I'm trying to save some arrays as parquets. I can use a python debugger to reach the point in my code that they are ready for saving. Inside my complicated mess of code they look like;

ipdb> ak.__version__
'1.2.2'
ipdb> array1
<Array [... 0., 1.]], [[50.], [4., 47.]]]] type='1 * 3 * var * var * float64'>
ipdb> array2
<Array [[False, True, True]] type='1 * 3 * bool'>

If I try to save them it doesn't work, the error I get is

ipdb> group = ak.zip({'a': array1, 'b': array2}, depth_limit=1)
ipdb> ak.to_parquet(group, 'test.parquet')
*** ValueError: could not broadcast input array from shape (3) into shape (1)

So I start messing around in the terminal to try and recreate the problem and debug it, but I actually cannot replicate it. Here is what happens;

In [1]: import awkward as ak
In [2]: ak.__version__
'1.2.2'
In [3]: cat = ak.from_iter([[True, False, True]])
In [4]: dog = ak.from_iter([[[], [[50.0], [0.2, 0.1, 0., 0., 0.1]], [[50.0], [21., 0.1, 0.]]]])
In [5]: pets = ak.zip({'dog':dog, 'cat':cat}, depth_limit=1)
In [6]: ak.to_parquet(pets, "test.parquet")
In [7]: # no problems
In [8]: cat
<Array [[False, True, True]] type='1 * var * bool'>

Notice that the dimensions have changed from 1 * 3 * bool to 1 * var * bool. That seems to be the only difference - but I cant seem to work out how to control this?


Having managed to isolate the issue, it wasn't what I thought it was. The problem comes when using np.newaxis to make a new axis in a boolean array, then trying to save it.


dog = ak.from_iter([1, 2, 3])[np.newaxis]
pets = {"dog": dog}
zipped = ak.zip(pets, depth_limit=1)
ak.to_parquet(zipped, "test.parquet")
# works fine

dog = ak.from_iter([True, False, True])[np.newaxis]
pets = {"dog": dog}
zipped = ak.zip(pets, depth_limit=1)
ak.to_parquet(zipped, "test.parquet")

# Gives 
ValueError: could not broadcast input array from shape (3) into shape (1)

I should really know better than to post a question without isolating the problem first. Apologies for wasting your time.


Solution

  • In ak.zip, depth_limit=1 means that the arrays are not deeply matched ("zipped") together: the only constraint is that len(array1) == len(array2). Is this not satisfied?

    In your pets example, len(cat) == 1 and len(dog) == 1. Since you're asking for depth_limit=1, it doesn't matter that len(cat[0]) == len(dog[0]), though in this case it does (they're both 3). Thus, it would be possible to zip these at depth_limit=2, even though that's not what you're asking for.

    Since the error message is saying that the mismatching lengths of array1 and array2 are 3 and 1, that should be easy to inspect in the debugger:

    array1[0]
    array1[1]
    array1[2]   # should be the last one
    
    array2[0]   # should be the only one
    

    I hope this sheds some light on your problem!


    Looking more closely, I see that you're telling me that you know the lengths of array1 and array2. They're both length 1. There should be no trouble zipping them at depth_limit=1.

    You can make your pets example have exactly the right types by calling ak.to_regular on that axis:

    >>> cat = ak.to_regular(ak.from_iter([[True, False, True]]), axis=1)
    >>> dog = ak.to_regular(ak.from_iter([[[], [[50.0], [0.2, 0.1, 0., 0., 0.1]], [[50.0], [21., 0.1, 0.]]]]), axis=1)
    >>> cat
    <Array [[True, False, True]] type='1 * 3 * bool'>
    >>> dog
    <Array [... 0, 0.1]], [[50], [21, 0.1, 0]]]] type='1 * 3 * var * var * float64'>
    

    So the types are exactly 1 * 3 * bool and 1 * 3 * var * var * float64. Zipping works:

    >>> pets = ak.zip({'dog':dog, 'cat':cat}, depth_limit=1)
    >>> pets
    <Array [... 0]]], cat: [True, False, True]}] type='1 * {"dog": var * var * var *...'>
    >>> pets.type
    1 * {"dog": var * var * var * float64, "cat": var * bool}
    

    Maybe the array1 and array2 you think you're working with are not what you're really working with?