Search code examples
pythonpandastensorflowdatasettransform

How to transform dataframe to tensorflow data set for GRU model? ValueError: (Unsupported numpy type: NPY_DATETIME)


I try to create a GRU model but I meet a problem about setting timestamp

Here is my input example:

Date = ['2021-08-06', '2021-08-07', '2021-08-08', '2021-08-09', '2021-08-10']
Date = pd.to_datetime(Date)
Close_SP = [4436.52, 4436.52, 4436.52, 4432.35, 4436.75]
Close_DJ = [333.96, 333.96, 333.96, 332.12, 328.85]
Close_Nasdaq = [14835.8, 14835.8, 14835.8, 14860.2, 14788.1]

X = pd.DataFrame({'Close_SP': Close_SP, 'Close_DJ': Close_DJ, 'Close_Nasdaq': Close_Nasdaq}, index = Date)

X.head()

    Close_SP    Close_DJ    Close_Nasdaq
2021-08-06  4436.52 333.96  14835.8
2021-08-07  4436.52 333.96  14835.8
2021-08-08  4436.52 333.96  14835.8
2021-08-09  4432.35 332.12  14860.2
2021-08-10  4436.75 328.85  14788.1

the input size of the GRU model is (batch size, timestamp, features), so I plan to get the date data and feature first, and then zip them.

x1 = tf.convert_to_tensor(X.index)
x2 = tf.convert_to_tensor(X)

input = tf.data.Dataset.zip((x1, x2))

However, I meet a ValueError: Failed to convert a NumPy array to a Tensor (Unsupported numpy type: NPY_DATETIME)

So, how do I fix the problem? Is there another efficient way to reach my goal?


Solution

  • I think you just need to convert the datetime object to a timestamp.

    x1 = tf.convert_to_tensor(X.index.values.astype(np.int64))
    

    I also ran into another error on this line:

    input = tf.data.Dataset.zip((x1, x1))
    

    TypeError: The argument to Dataset.zip() must be a (nested) structure of Dataset objects.

    To get past that, I converted both tensors to Datasets.

    d1 = tf.data.Dataset.from_tensors(x1)
    d2 = tf.data.Dataset.from_tensors(x2)
    
    input = tf.data.Dataset.zip((d1, d2))
    

    This results in an object of <ZipDataset shapes: ((5,), (5, 3)), types: (tf.int64, tf.float64)>.