tensorflow machine-learning keras neural-network data-analysis

Regression Problem: How to solve the problem of highly decimal input features

I have the following input data structure:

   X1     |    X2     |    X3     | ... | Output (Label)
118.12341 | 118.12300 | 118.12001 | ... | [a value between 0 & 1] e.g. 0.423645

Where I'm using tensorflow in order to solve the regression problem here of predicting the future value of the Output variable. For that i built a feed forward neural network with three hidden layers having relu activation functions and a final output layer with one node of linear activation. This network is trained with back-propagation using adam optimizer.

My problem is that after training the network for some thousands of epochs, I realized that this highly decimal values in both input features and the output, resulted in predictions near to the second decimal place only, for example:

Real value = 0.456751 | Predicted value = 0.452364

However this is not accepted, where i need a precision to the forth decimal place (at least) to accept the value.

Q: Is there any trustworthy technique to solve this problem properly for getting better results (maybe a transformation algorithm)?

Thanks in advance.

Solution

Assuming you are using a regular MSE loss, this will probably not suit your purpose of relatively low-tolerance in the error per instance. To elaborate, the MSE is defined as follow the average of the the square of the differences between the predicted and true outputs.

Assuming you have 4 instances, and two trained functions that generates the following error per instance:

F1 error rates : (4,.0004, .0002, .0002)

F2 error rates : (.9, .9, .9, .9)

It's obvious that MSE would go for F2, since the average MSE is .81, while the average MSE for F1 is approx 16.

So to conclude, MSE gives too little weight to small differences in value < 1, while it exaggerates the weight for bigger differences in value > 1 because of the square function applied.

You could try MAE, which stands for MEAN ABSOLUTE ERROR, it's only difference is that it doesn't perform a square function on the individual errors, rather it calculates the absolute. There are many other regression losses that could give significant weight to smaller errors like the HUBER loss with a small delta (< 0), you can read more about those losses here.

Another possible solution would be to transform this into a classification problem, where the prediction is true if it's exactly identical to the outputs to the 4th decimal point for example and else it's false.