Search code examples
pythonnumpytensorflowscipy

Mapping numpy/scipy function over tensorflow.data.Dataset


I'm trying to extract certain features of a 1D signal that is windowed using tf.keras.preprocessing.timeseries_dataset_from_array, from where I obtain a tf.data.Dataset object, ds (see code below). Ideally, I'd like to map the feature functions (which use numpy and scipy) over the dataset using the built-in map method of the dataset.

However, when I try to do this naively:

import numpy as np
import scipy as sc
import tensorflow as tf

def feat1_func(x, sf, axis):
    x = np.asarray(x)
    feat1_value = np.apply_along_axis(
        lambda vals: sc.integrate.trapezoid(abs(vals), dx=1 / sf), axis=axis, arr=x
    )
    return feat1_value

features = ['feat1']
feature_map = {'feat1': feat1_func}

x = np.random.rand(100, 5)
y = np.random.randint(low=0, high=2, size=100)

sequence_length = 10
sequence_stride = 3

ds = tf.keras.preprocessing.timeseries_dataset_from_array(
        data=x,
        targets=y,
        sequence_length=sequence_length,
        sequence_stride=sequence_stride,
        batch_size=None,
        shuffle=False,
    )

feat_lambda = lambda x, y: (np.array([feature_map[ft](x, sf=1000, axis=0) for ft in features]), y)
ds = ds.map(feat_lambda)

I obtain the following error message:

NotImplementedError: Cannot convert a symbolic tf.Tensor (args_0:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.

What is the easiest workaround for this issue? Is it possible to convert the symbolic tensors to eager tensors when the mapping takes place?


Solution

  • The solution is to change the line

    feat_lambda = lambda x, y: (np.array([feature_map[ft](x, sf=1000, axis=0) for ft in features]), y)
    

    to

    feat_lambda = lambda x, y: ([tf.numpy_function(feature_map[ft], [x, 1000, 0], tf.float64) for ft in features], y)
    

    tf.numpy_function accepts a function that handles numpy array and handles the tensors of the Dataset in eager mode (aka, tensors with real values and not symbolic Tensors).
    The ds = ds.map(feat_lambda) also ran without errors with tf.py_function, but I got errors further down the line when I tried to loop over the dataset:

    # This didn't work
    feat_lambda = lambda x, y: ([tf.py_function(feature_map[ft], [x, 1000, 0], tf.float64) for ft in features], y)
    ds2 = ds.map(feat_lambda)
    for elem in ds2:
        print(elem)  # here I got an error with tf.py_function, not with tf.numpy_function