I am doing a task on traffic analysis and I am stymied with some error in my code. My data rows are like this:
qurter | DOW (Day of week)| Hour | density | speed | label (predicted speed for another half an hour)
The values are like this:
1, 6, 19, 23, 53.32, 45.23
Which means in some specific street during 1st
quarter of 19
o'clock on Friday
, density of traffic is measured 23
and current speed is 53.32
. the predicted speed would be 45.23
.
The task is to predict the speed for another half an hour by predictors given above.
I am using this code to build a TensorFlow DNNRegressor
for data:
import pandas as pd
data = pd.read_csv('dataset.csv')
X = data.iloc[:,:5].values
y = data.iloc[:, 5].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=0)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(data=scaler.transform(X_train),columns = ['quarter','DOW','hour','density','speed'])
X_test = pd.DataFrame(data=scaler.transform(X_test),columns = ['quarter','DOW','hour','density','speed'])
y_train = pd.DataFrame(data=y_train,columns = ['label'])
y_test = pd.DataFrame(data=y_test,columns = ['label'])
import tensorflow as tf
speed = tf.feature_column.numeric_column('speed')
hour = tf.feature_column.numeric_column('hour')
density = tf.feature_column.numeric_column('density')
quarter= tf.feature_column.numeric_column('quarter')
DOW = tf.feature_column.numeric_column('DOW')
feat_cols = [h_percentage, DOW, hour, density, speed]
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train ,batch_size=10,num_epochs=1000,shuffle=False)
model = tf.estimator.DNNRegressor(hidden_units=[5,5,5],feature_columns=feat_cols)
model.train(input_fn=input_func,steps=25000)
predict_input_func = tf.estimator.inputs.pandas_input_fn(
x=X_test,
batch_size=10,
num_epochs=1,
shuffle=False)
pred_gen = model.predict(predict_input_func)
predictions = list(pred_gen)
final_preds = []
for pred in predictions:
final_preds.append(pred['predictions'])
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test,final_preds)**0.5
when I run this code, It throws an error with this ending:
TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'label': <tf.Tensor 'fifo_queue_DequeueUpTo:6' shape=(?,) dtype=float64>}. Consider casting elements to a supported type.
First of all what is the concept of error? I couldn't find source for reason of error to deal with it. And how can I modify code for solution?
secondly does it improve the model performance to use tensorflow categorical_column_with_identity
instead of numeric_columns
for DOW which indicates days of week?
I also want to know if it's useful to merge quarter
and hour
as a single column like day time
(quarter
is minutes in an hour which is going to be normalized between 0 and 1)?
First of all what is the concept of error? I couldn't find source for reason of error to deal with it. And how can I modify code for solution?
Let me first talk about the solution to the problem. You need to change parameter y
in pandas_input_fn
as follows.
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train['label'],batch_size=10,num_epochs=1000,shuffle=False)
It seems that the parameters y
in pandas_input_fn
doesn't support dataframe
type when you run to model.train()
. pandas_input_fn
parses every sample y
to a form similar to {columnname: value}
in this case, but model.train()
can't recognize it. So you need to pass series
type.
secondly does it improve the model performance to use tensorflow categorical_column_with_identity instead of numeric_columns for DOW which indicates days of week?
This involves when we should choose categorical
or choose numeric
for feature engineering. A very simple rule is to choose numeric
if there is a significant difference between big and small in the internal comparison of your feature. If the feature does not have bigger or smaller significance, you should choose categorical
. So I tend to choose categorical_column_with_identity
for feature DOW
.
I also want to know if it's useful to merge quarter and hour as a single column like day time (quarter is minutes in an hour which is going to be normalized between 0 and 1)?
Cross features may bring some benefits such as latitude and longitude features. I recommend you to use tf.feature_column.crossed_column
(link) here. It returns a column for performing crosses of categorical features. You can also continue to retain features quarter
and hour
in model at the same time, .