Search code examples
pythontensorflowmachine-learningtensorflow-datasetstensorflow-estimator

How to combine tf.data.Dataset and tf.estimator.DNNRegressor properly


I am currently learning to use tensorflow and have troubles getting started. I would like to use the newest API, namely estimator and dataset. But if I run the code presented below I get an Error.

On the tensorflow page https://www.tensorflow.org/api_docs/python/tf/estimator/DNNRegressor I found, that "The function should construct and return one of the following: * A tf.data.Dataset object: Outputs of Dataset object must be a tuple (features, labels) with same constraints as below."

I thought my code would provide that, but there seems to be a problem and I am out of ideas.

import tensorflow as tf
def input_evaluation_set():
    data = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]
    labels = []
    for d in data:
        labels.append(1)
    return tf.data.Dataset.from_tensor_slices((tf.constant(data), tf.constant(labels)))

point = tf.feature_column.numeric_column('points')
estimator = tf.estimator.DNNRegressor(feature_columns = [point],hidden_units = [100,100,100])

estimator.train(input_fn = input_evaluation_set)

I expect to run a training session on a deep neural network with 3 hidden layers a' 100 neurons in order to approximate the 'constant 1' function; instead I get the Error "ValueError: features should be a dictionary of 'Tensor's. Given type: class, 'tensorflow.python.framework.ops.Tensor'


Solution

  • You need to use .batch on your database in order to have the right format.

    The following is working on my computer:

    import tensorflow as tf
    import numpy as np
    
    def basic_dataset(numPoints):
        data = np.linspace(0,1,numPoints)
        dataset = dict({'points': data})
        labels = []
        for d in data:
            labels.append(1)
        return tf.data.Dataset.from_tensor_slices((dataset, np.array(labels)))
    
    def input_train_set():
        dataset = basic_dataset(11)
        return dataset.repeat(100).shuffle(1000).batch(1)
    
    point = tf.feature_column.numeric_column('points')
    estimator = tf.estimator.DNNRegressor(feature_columns = [point],hidden_units = [100,100,100], label_dimension = 1)
    
    estimator.train(input_fn = input_train_set)