tensorflow time-series artificial-intelligence prediction recurrent-neural-network

ValueError: Cannot feed value of shape (6165, 5) for Tensor 'Placeholder_1:0', which has shape '(?, 1)'

> WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From C:/Users/SONSANGWOO/Desktop/Euroaquae/The_third_semester_at_BCN/ANN/Exercise/TimeSeriespy_RNN.py:74: BasicLSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
WARNING:tensorflow:From C:/Users/SONSANGWOO/Desktop/Euroaquae/The_third_semester_at_BCN/ANN/Exercise/TimeSeriespy_RNN.py:75: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
WARNING:tensorflow:From C:\Users\SONSANGWOO\Anaconda3\lib\site-packages\tensorflow\python\ops\tensor_array_ops.py:162: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):

  File "<ipython-input-1-7716630f4e29>", line 1, in <module>
    runfile('C:/Users/SONSANGWOO/Desktop/Euroaquae/The_third_semester_at_BCN/ANN/Exercise/TimeSeriespy_RNN.py', wdir='C:/Users/SONSANGWOO/Desktop/Euroaquae/The_third_semester_at_BCN/ANN/Exercise')

  File "C:\Users\SONSANGWOO\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 704, in runfile
    execfile(filename, namespace)

  File "C:\Users\SONSANGWOO\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/SONSANGWOO/Desktop/Euroaquae/The_third_semester_at_BCN/ANN/Exercise/TimeSeriespy_RNN.py", line 97, in <module>
    X: trainX, Y: trainY})

  File "C:\Users\SONSANGWOO\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
    run_metadata_ptr)

  File "C:\Users\SONSANGWOO\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1128, in _run
    str(subfeed_t.get_shape())))

ValueError: Cannot feed value of shape (6165, 5) for Tensor 'Placeholder_1:0', which has shape '(?, 1)'

I am getting an error and I just check the dimension of each variable and it looks the same without any problem... could you let me know what is wrong and how to fix?

What I would like to do is the weather prediction. The input shape is going to be ( xxxx , 5), here xxxx is number of rows in input data, and 5 is the types of input, including mean temperature, and so on.

The output shape must be (yyyy, 1), simply because its column is going to have predicted precipitation.

Strangely, when the program is reading the file, the Data_Y has a shape ( hhhh, 5), which was supposed to be (yyyy, 1).

And I assumed this caused all the errors here.

The link of the input file is as below

Input file

How do I solve this problem? Please give me your helping hand.


import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib
from matplotlib import pyplot as plt

tf.reset_default_graph()
tf.set_random_seed(777)  # reproducibility





def MinMaxScaler(data):

    numerator = data - np.min(data, 0)
    denominator = np.max(data, 0) - np.min(data, 0)
    # noise term prevents the zero division
    return numerator / (denominator + 1e-7)


# train Parameters
seq_length = 6
data_dim = 5
hidden_dim = 10
output_dim = 1
learning_rate = 0.01
iterations = 500




# Open, High, Low, Volume, Close
#df = pd.read_csv("precipitation_post.csv", quotechar='"', decimal=".")
#df = df.interpolate(method ='linear', limit_direction ='forward')
#xy = df.reindex(index=df.index[::-1])
xy = np.loadtxt('df.txt', dtype='double', delimiter=' ', skiprows=1)
#xy = xy[::-1]  

# train/test split
train_size = int(len(xy) * 0.7)
train_set = xy[0:train_size]
test_set = xy[train_size - seq_length:] # Index from [train_size - seq_length] to utilize past sequence

# Scale each
train_set = MinMaxScaler(train_set)
test_set = MinMaxScaler(test_set)
x = xy
y = xy[:, [-1]] # close as label

# build datasets
def build_dataset(time_series, seq_length):
    dataX = []
    dataY = []
    for i in range(0, len(time_series) - seq_length):
        _x = time_series[i:i + seq_length]
        _y = time_series[i + seq_length]
        print(_x, "->", _y)
        dataX.append(_x)
        dataY.append(_y)
    return np.array(dataX), np.array(dataY)

trainX, trainY = build_dataset(train_set, seq_length)
testX, testY = build_dataset(test_set, seq_length)

# input place holders
X = tf.placeholder(tf.float32, shape=[None, seq_length, data_dim])
Y = tf.placeholder(tf.float32, shape=[None, 1])

# build a LSTM network
cell = tf.contrib.rnn.BasicLSTMCell(
    num_units=hidden_dim, state_is_tuple=True, activation=tf.tanh)
outputs, _states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
Y_pred = tf.contrib.layers.fully_connected(
    outputs[:, -1], output_dim, activation_fn=None)  # We use the last cell's output

# cost/loss
loss = tf.reduce_sum(tf.square(Y_pred - Y))  # sum of the squares
# optimizer
optimizer = tf.train.AdamOptimizer(learning_rate)
train = optimizer.minimize(loss)

# RMSE
targets = tf.placeholder(tf.float32, [None, 1])
predictions = tf.placeholder(tf.float32, [None, 1])
rmse = tf.sqrt(tf.reduce_mean(tf.square(targets - predictions)))

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)

    # Training step
    for i in range(iterations):
        _, step_loss = sess.run([train, loss], feed_dict={
                                X: trainX, Y: trainY})
        print("[step: {}] loss: {}".format(i, step_loss))

    # Test step
    test_predict = sess.run(Y_pred, feed_dict={X: testX})
    rmse_val = sess.run(rmse, feed_dict={
                    targets: testY, predictions: test_predict})
    print("RMSE: {}".format(rmse_val))

    # Plot predictions
plt.plot(testY)
plt.plot(test_predict)
plt.xlabel("Time Period")
plt.ylabel("Precipitation")
plt.show()

Solution

Given the information you've provided, here's a solution. So obviously as you probably have realized, the problem is in the build_dataset function. You need to change your function to the following.

def build_dataset(data, seq_length):
  dataX = []
  dataY = []
  for i in range(seq_length):
    dataX.append(data[i:data.shape[0]-(seq_length-i)].reshape(-1, 1, 5))
  dataX = np.concatenate(dataX, axis=1)
  dataY = data[i+1:train_set.shape[0],4].reshape(-1, 1)
  return dataX, dataY

This function returns data in the following manner. Say you have the following lines,

22.90 20.20 31.00 93.00 0.00
22.90 21.20 26.00 91.00 0.00
22.40 20.20 27.40 89.00 0.00
22.40 15.40 29.00 90.00 0.00
21.30 14.40 26.00 82.00 0.00
21.50 20.20 23.00 96.00 0.00
22.10 17.20 23.60 97.00 20.70

It gives X as,

22.90 20.20 31.00 93.00 0.00
22.90 21.20 26.00 91.00 0.00
22.40 20.20 27.40 89.00 0.00
22.40 15.40 29.00 90.00 0.00
21.30 14.40 26.00 82.00 0.00
21.50 20.20 23.00 96.00 0.00

And the output Y as, 20.70

For the full dataset, it results in the following shapes.

Input: (6165, 6, 5)
Output: (6165, 1)