Search code examples
tensorflowdeep-learningneural-networkconv-neural-networkregression

Training a CNN to predict a parameter of a function


I am trying to build a CNN with TensorFlow to solve the following kind of a regression task. The idea would be that there is some unknown function and we are interested in knowing some parameter of that function. As features, we have a vector of x values and a vector y which contains the function values at the corresponding x values. The corresponding label would be the function parameter. The CNN would take the x and y values and predict the parameter value. However, I am unable to achieve a sufficient accuracy with my CNN model.

To be more concrete, consider the following simple example. All functions would be simple linear functions y = kx and the task would be to predict the slope k. To acquire data, we may use the following Python code:

N_data = 50000 # number of data points

X = [] # features (x and y values)
y = [] # labels (slopes)

for k in range(N_data):
    # Randomly choose the x values:
    x_min = 200*random.random()-100
    xs = np.linspace(x_min, x_min + 10)

    # Randomly choose the slope:
    k = 200*random.random()-100

    # Calculate the function values:
    ys = k*xs
    
    # Store the data:
    X.append(xs.tolist() + ys.tolist())
    y.append(k)
    
X = np.array(X)
y = np.array(y)

tf.random.set_seed(41)

# Split the data into training, validation and test sets:
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.2, random_state=41)

# Split the temporary set into validation and test sets:
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=41)

Here's my CNN model:

model = models.Sequential()

# Data normalization layer
model.add(layers.InputLayer(input_shape=(X[0].shape[0], 1)))
model.add(layers.BatchNormalization())

# Convolutional block 1:
model.add(layers.Conv1D(32, 3, activation='relu'))
model.add(layers.AveragePooling1D(2))

# Convolutional block 2:
model.add(layers.Conv1D(64, 3, activation='relu'))
model.add(layers.AveragePooling1D(2))

# Convolutional block 3:
model.add(layers.Conv1D(128, 3, activation='relu'))
model.add(layers.MaxPooling1D(2))

model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))

And the following code fits the model:

model.compile(optimizer=Adam(learning_rate=0.01),
              loss='mean_squared_error',
              metrics=['mae'])

history = model.fit(X_train, y_train, epochs=50, batch_size = 16, 
                    validation_data=(X_val, y_val))

The training and validation mean absolute errors are shown in the following figure: enter image description here

I am unsure how to interpret that the validation errors are consistently smaller than the training errors. With the test set (10 % of the total data, i.e., 5 000 samples), the mean absolute error is 3.37, which is quite high with this simple problem and this large number of data points. What could I do to improve the model? I am unsure if the problem is with the number of data points, with the CNN architecture, or how the input data is formatted. Any suggestions would be appreciated.


Solution

  • Consider the inputs to be [xi,yi] and the output [ki] where xi*ki = yi.

    For a standard input/activation.

    A(zj - (xi*wxj + yi*wyj))
    

    You cannot have the result of this activation function to be k.

    You can categorize k though. The idea would be to categorize k enough that the value can be a scalar. Think of just positive x, y and k.

    sigmoid( - xi*10 + yi )
    

    Thus the crossover is when k > 10.

    With this idea you can build enough outputs to categorize k into a range of values.

    If we use two points we don't have to do division.

    (x0, y0, x0 + 1, y1)
    

    Now we can see the slope is x[3] - x[1]. This is what we should see from your example, since you always use the same linspec. I suspect from your pooling/convolutions you've somehow eliminated the possibility of this training route.

    I made the input:

    X.append( np.array((xs[i], xs[i+1], ys[i], ys[i+1])) )
    

    And I changed the model:

    model = models.Sequential()
    
    model.add(layers.InputLayer(input_shape=(X[0].shape[0], 1)))
    nn = 32
    nl = 4
    act = 'relu'
    model.add(layers.Flatten())
    for i in range(nl):
        model.add(layers.Dense(nn, activation=act))
        model.add(layers.Dense(1, activation='linear'))
    model.add(layers.Dense(1, activation='linear'))
    

    That model and data learns to predict k. It's really just calculating the difference between different y values.