tensorflow regression python-3.7 tensorflow-estimator feature-engineering

Why Tensorflow error: `failed to convert object of type <class 'dict'> to Tensor` happens and How can I solve it?

I am doing a task on traffic analysis and I am stymied with some error in my code. My data rows are like this:

The values are like this:

1, 6, 19, 23, 53.32, 45.23

Which means in some specific street during 1st quarter of 19 o'clock on Friday, density of traffic is measured 23 and current speed is 53.32. the predicted speed would be 45.23.

The task is to predict the speed for another half an hour by predictors given above.

I am using this code to build a TensorFlow DNNRegressor for data:

import pandas as pd
data = pd.read_csv('dataset.csv')
X = data.iloc[:,:5].values
y = data.iloc[:, 5].values
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=0)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)

X_train = pd.DataFrame(data=scaler.transform(X_train),columns =    ['quarter','DOW','hour','density','speed'])
X_test = pd.DataFrame(data=scaler.transform(X_test),columns = ['quarter','DOW','hour','density','speed'])

y_train = pd.DataFrame(data=y_train,columns = ['label'])
y_test = pd.DataFrame(data=y_test,columns = ['label'])

import tensorflow as tf

speed = tf.feature_column.numeric_column('speed')
hour = tf.feature_column.numeric_column('hour')
density = tf.feature_column.numeric_column('density')
quarter= tf.feature_column.numeric_column('quarter')
DOW = tf.feature_column.numeric_column('DOW')

feat_cols = [h_percentage, DOW, hour, density, speed]
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train ,batch_size=10,num_epochs=1000,shuffle=False)

model = tf.estimator.DNNRegressor(hidden_units=[5,5,5],feature_columns=feat_cols)
model.train(input_fn=input_func,steps=25000)
predict_input_func = tf.estimator.inputs.pandas_input_fn(
  x=X_test,
  batch_size=10,
  num_epochs=1,
  shuffle=False)

pred_gen = model.predict(predict_input_func)

predictions = list(pred_gen)
final_preds = []
for pred in predictions:
    final_preds.append(pred['predictions'])

from sklearn.metrics import mean_squared_error

mean_squared_error(y_test,final_preds)**0.5

when I run this code, It throws an error with this ending:

TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'label': <tf.Tensor 'fifo_queue_DequeueUpTo:6' shape=(?,) dtype=float64>}. Consider casting elements to a supported type. First of all what is the concept of error? I couldn't find source for reason of error to deal with it. And how can I modify code for solution?

secondly does it improve the model performance to use tensorflow categorical_column_with_identity instead of numeric_columns for DOW which indicates days of week?

I also want to know if it's useful to merge quarter and hour as a single column like day time (quarter is minutes in an hour which is going to be normalized between 0 and 1)?

Solution

First of all what is the concept of error? I couldn't find source for reason of error to deal with it. And how can I modify code for solution?

Let me first talk about the solution to the problem. You need to change parameter y in pandas_input_fn as follows.

input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train['label'],batch_size=10,num_epochs=1000,shuffle=False)

It seems that the parameters y in pandas_input_fn doesn't support dataframe type when you run to model.train(). pandas_input_fn parses every sample y to a form similar to {columnname: value} in this case, but model.train() can't recognize it. So you need to pass series type.

secondly does it improve the model performance to use tensorflow categorical_column_with_identity instead of numeric_columns for DOW which indicates days of week?

This involves when we should choose categorical or choose numeric for feature engineering. A very simple rule is to choose numeric if there is a significant difference between big and small in the internal comparison of your feature. If the feature does not have bigger or smaller significance, you should choose categorical. So I tend to choose categorical_column_with_identity for feature DOW.

I also want to know if it's useful to merge quarter and hour as a single column like day time (quarter is minutes in an hour which is going to be normalized between 0 and 1)?

Cross features may bring some benefits such as latitude and longitude features. I recommend you to use tf.feature_column.crossed_column(link) here. It returns a column for performing crosses of categorical features. You can also continue to retain features quarter and hour in model at the same time, .