Search code examples
pythontensorflowtensorflow-datasets

How to create a tf.data.Datasets for linear regression and train model


Can I train a linear regression model with tf.data.Datasets? If I run the following code

import tensorflow as tf
import numpy as np

x = np.linspace(1, 10, num=10**2)
y = 54*x + 33

ds = tf.data.Dataset.from_tensor_slices(list(zip(x, y)))

model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(1, input_shape = [1,]),
        tf.keras.layers.Dense(10, activation="sigmoid"),
        tf.keras.layers.Dense(1)
    ])

model.compile(loss="mean_absolute_error", optimizer="adam")
model.fit(ds, epochs=5)

I get the error

ValueError: Target data is missing. Your model was compiled with loss=mean_absolute_error, and therefore expects target data to be provided in `fit()`.

It is possible to train like that?


Solution

  • You need to consider:

    1. Create dataset like from_tensor_slices((x,y))
    2. Define and take dataset with batch, like : ds = ds.batch(32)
    import tensorflow as tf
    import numpy as np
    
    x = np.linspace(1, 10, num=10**2)
    y = 54*x + 33
    
    ds = tf.data.Dataset.from_tensor_slices((x,y))
    ds = ds.batch(32)
    model = tf.keras.models.Sequential([
            tf.keras.layers.Dense(1, input_shape = [1,]),
            tf.keras.layers.Dense(10, activation="sigmoid"),
            tf.keras.layers.Dense(1)
        ])
    
    model.compile(loss="mean_absolute_error", optimizer="adam")
    model.fit(ds, epochs=5)
    

    Output:

    Epoch 1/5
    4/4 [==============================] - 0s 5ms/step - loss: 329.4714
    Epoch 2/5
    4/4 [==============================] - 0s 8ms/step - loss: 329.4355
    Epoch 3/5
    4/4 [==============================] - 0s 11ms/step - loss: 329.3994
    Epoch 4/5
    4/4 [==============================] - 0s 6ms/step - loss: 329.3628
    Epoch 5/5
    4/4 [==============================] - 0s 9ms/step - loss: 329.3259
    

    Update: How to create a model and train for linear regression? You don't need a complex and large network only a Dense(1) with activation='linear' is OK.

    import tensorflow as tf
    import numpy as np
    
    x = np.random.rand(10000)
    y = 54*x + 33
    
    ds = tf.data.Dataset.from_tensor_slices((x,y))
    ds = ds.batch(64)
    model = tf.keras.models.Sequential([
            tf.keras.layers.Dense(1, input_shape = [1,]),
            tf.keras.layers.Dense(1, activation='linear')
        ])
    
    model.compile(loss="mean_absolute_error", optimizer="adam")
    model.fit(ds, epochs=50)
    

    Epoch 1/50
    157/157 [==============================] - 1s 2ms/step - loss: 60.0440
    Epoch 2/50
    157/157 [==============================] - 0s 2ms/step - loss: 59.6723
    Epoch 3/50
    157/157 [==============================] - 0s 2ms/step - loss: 59.1068
    ...
    Epoch 48/50
    157/157 [==============================] - 0s 2ms/step - loss: 0.1588
    Epoch 49/50
    157/157 [==============================] - 0s 2ms/step - loss: 0.0053
    Epoch 50/50
    157/157 [==============================] - 0s 3ms/step - loss: 0.0039