I am trying to build a CNN with TensorFlow to solve the following kind of a regression task. The idea would be that there is some unknown function and we are interested in knowing some parameter of that function. As features, we have a vector of x values and a vector y which contains the function values at the corresponding x values. The corresponding label would be the function parameter. The CNN would take the x and y values and predict the parameter value. However, I am unable to achieve a sufficient accuracy with my CNN model.
To be more concrete, consider the following simple example. All functions would be simple linear functions y = kx and the task would be to predict the slope k. To acquire data, we may use the following Python code:
N_data = 50000 # number of data points
X = [] # features (x and y values)
y = [] # labels (slopes)
for k in range(N_data):
# Randomly choose the x values:
x_min = 200*random.random()-100
xs = np.linspace(x_min, x_min + 10)
# Randomly choose the slope:
k = 200*random.random()-100
# Calculate the function values:
ys = k*xs
# Store the data:
X.append(xs.tolist() + ys.tolist())
y.append(k)
X = np.array(X)
y = np.array(y)
tf.random.set_seed(41)
# Split the data into training, validation and test sets:
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.2, random_state=41)
# Split the temporary set into validation and test sets:
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=41)
Here's my CNN model:
model = models.Sequential()
# Data normalization layer
model.add(layers.InputLayer(input_shape=(X[0].shape[0], 1)))
model.add(layers.BatchNormalization())
# Convolutional block 1:
model.add(layers.Conv1D(32, 3, activation='relu'))
model.add(layers.AveragePooling1D(2))
# Convolutional block 2:
model.add(layers.Conv1D(64, 3, activation='relu'))
model.add(layers.AveragePooling1D(2))
# Convolutional block 3:
model.add(layers.Conv1D(128, 3, activation='relu'))
model.add(layers.MaxPooling1D(2))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
And the following code fits the model:
model.compile(optimizer=Adam(learning_rate=0.01),
loss='mean_squared_error',
metrics=['mae'])
history = model.fit(X_train, y_train, epochs=50, batch_size = 16,
validation_data=(X_val, y_val))
The training and validation mean absolute errors are shown in the following figure:
I am unsure how to interpret that the validation errors are consistently smaller than the training errors. With the test set (10 % of the total data, i.e., 5 000 samples), the mean absolute error is 3.37, which is quite high with this simple problem and this large number of data points. What could I do to improve the model? I am unsure if the problem is with the number of data points, with the CNN architecture, or how the input data is formatted. Any suggestions would be appreciated.
Consider the inputs to be [xi,yi] and the output [ki] where xi*ki = yi.
For a standard input/activation.
A(zj - (xi*wxj + yi*wyj))
You cannot have the result of this activation function to be k.
You can categorize k though. The idea would be to categorize k enough that the value can be a scalar. Think of just positive x, y and k.
sigmoid( - xi*10 + yi )
Thus the crossover is when k > 10.
With this idea you can build enough outputs to categorize k into a range of values.
If we use two points we don't have to do division.
(x0, y0, x0 + 1, y1)
Now we can see the slope is x[3] - x[1]. This is what we should see from your example, since you always use the same linspec. I suspect from your pooling/convolutions you've somehow eliminated the possibility of this training route.
I made the input:
X.append( np.array((xs[i], xs[i+1], ys[i], ys[i+1])) )
And I changed the model:
model = models.Sequential()
model.add(layers.InputLayer(input_shape=(X[0].shape[0], 1)))
nn = 32
nl = 4
act = 'relu'
model.add(layers.Flatten())
for i in range(nl):
model.add(layers.Dense(nn, activation=act))
model.add(layers.Dense(1, activation='linear'))
model.add(layers.Dense(1, activation='linear'))
That model and data learns to predict k. It's really just calculating the difference between different y values.