Search code examples
pythonpyarrowapache-arrow

What is the use of PyArrow Tensor class?


In the Arrow documentation there is a class named Tensor that is created from numpy ndarrays. However, the documentation is pretty sparse, and after playing a bit I haven't found an use case for it. For example, you can't construct a table with it:

import pyarrow as pa
import numpy as np

x = np.random.normal(0, 1.5, size=(4, 3, 2))
T = pa.Tensor.from_numpy(x, dim_names="xyz")

# error
pa.table([pa.array([0, 1, 2, 3]), T], names=["f1", "f2"])

Neither there is a type for schemas and structs. So my question is: what is it there for? Can someone provide a simple example using them?

Here's a related question from over 5 years ago, but it asked about Parquet. While I'm interested in persisting these tensors, before that I should understand how to use them, and as of today, I don't.


Solution

  • AFAIK the pyarrow Tensor class is only used in IPC (serializing): https://arrow.apache.org/docs/dev/format/Other.html (so as a message in IPC specification).

    To use tensors in pyarrow Table you would have to use an extension type for it. We are currently working on that and here you can find an umbrella issue:

    https://github.com/apache/arrow/issues/33924

    And you can see how it will be used in the PyArrow implementation example: https://github.com/apache/arrow/pull/33948/files