Search code examples
pythontensorflowkerasautoencoder

Why does Autoencoder take the input data as label data as well


I was following a guide to build a basic, single layer autoencoder to learn Keras as well as autoencoders. However I realised that the model takes X_train data as both input data and label data during the training and it is actually same with X_test in the evaluation. I also checked another example autoencoder code and they had the same structure as well.

Is this because autoencoder takes the data as it is, like the label is data itself there is no specific label? What is the reason behind this, couldn't we just run the model without any label given at all (I actually tried this but Keras did not like the idea and gave an error)?

Training and evaluation calls are the followings

autoencoder.fit(X_train,X_train, epochs=nb_epoch, batch_size=batch_size, shuffle=True, verbose=0)

test_predictions = autoencoder.predict(X_test)
print('Test reconstrunction error\n', sklearn.metrics.mean_squared_error(X_test, test_predictions))

Note: my data is just randomly created, normally distributed 5d data if that has any effect

Edit: Thank you all it was my bad/confusion, as I said in the comments, I have completely overlooked the part that the system will compare the reconstructed output with the label data. As the comparison is said to be done with the input data when talked verbally. However in the built system the error is calculated with the difference between the label data given and the output


Solution

  • Autoencoders aim to compress input data into reduced, meaningful parameters and decode them back to the input data. Therefore, the model is well trained when it manages to reproduce exactly the input data. For this reason, the y label is the input data itself and the loss that you get as output is a measure of similarity between what you want to predict (the input data itself) and what your model actually produces