Search code examples
tensorflowkerasdeep-learningword2vecautoencoder

Deep Convolutional Autoencoder for movie similarity


i am new to python and i have a dataset that contains movie descriptions and i am trying to create a model that can calculate movie similarity based on these descriptions. so i started by turning each movie description into a Word2Vec vector where each word has a size 100,since the longest movie description in my dataset has 213 words, each movie description is turned into a vector of size 21300. now my next step is to reduce the dimensionality of these vectors using a convolutional autoencoder. it was recommended to me that i turn each 21300-sized vector into a 150 by 142 matrix so i did that, my goal is to compress these matrices from 150 by 142 to 5 by 5 matrix which i will then flatten and use to calculate cosine similarity between different compressed movie vectors. now here is my faulty code so far:

encoder_input = keras.Input(shape=(21300,), name='sum')
encoded= tf.keras.layers.Reshape((150,142),input_shape=(21300,))(encoder_input)
x = tf.keras.layers.Conv1D(32, 3, activation="relu", padding="same",input_shape=(16,150,142))(encoded)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x = tf.keras.layers.Conv1D(32, 3, activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x = tf.keras.layers.Conv1D(16, 3, activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x = tf.keras.layers.Conv1D(16, 3, activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x = tf.keras.layers.Conv1D(8, 3, activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling1D(2, padding="same")(x)
x=tf.keras.layers.Flatten()(x)
encoder_output=keras.layers.Dense(units=25, activation='relu',name='encoder')(x)
x= tf.keras.layers.Reshape((5,5),input_shape=(25,))(encoder_output)

# Decoder

decoder_input=tf.keras.layers.Conv1D(8, 3, activation='relu', padding='same')(x)
x = tf.keras.layers.UpSampling1D(2)(decoder_input)
x = tf.keras.layers.Conv1D(16, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(16, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(32, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
x = tf.keras.layers.Conv1D(32, 3, activation='relu')(x)
x = tf.keras.layers.UpSampling1D(2)(x)
#x=tf.keras.layers.Flatten()(x)
decoder_output = keras.layers.Conv1D(1, 3, activation='relu', padding='same')(x)




opt = tf.keras.optimizers.Adam(learning_rate=0.001, decay=1e-6)

autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')

autoencoder.compile(opt, loss='mse')
autoencoder.summary()

history = autoencoder.fit(
movies_vector,
movies_vector,
epochs=25

        )
   

print("ENCODER READY")
#USING THE MIDDLE LAYER 
encoder = keras.Model(inputs=autoencoder.input,
   outputs=autoencoder.get_layer('encoder').output)

running this code produces the following error:

ValueError: Dimensions must be equal, but are 100 and 21300 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](mean_squared_error/remove_squeezable_dimensions/Squeeze, IteratorGetNext:1)' with input shapes: [?,100], [?,21300].

how can i fix this autoencoder?


Solution

  • I was able to reproduce the error with dummy data. Changing the decoder model as follows will help.

    decoder_input=tf.keras.layers.Conv1D(8, 3, activation='relu', padding='same')(x)
    x = tf.keras.layers.UpSampling1D(2)(decoder_input)
    x = tf.keras.layers.Conv1D(16, 3, activation='relu')(x)
    x = tf.keras.layers.UpSampling1D(2)(x)
    x = tf.keras.layers.Conv1D(16, 3, activation='relu')(x)
    x = tf.keras.layers.UpSampling1D(2)(x)
    x = tf.keras.layers.Conv1D(32, 3, activation='relu')(x)
    x = tf.keras.layers.UpSampling1D(2)(x)
    x = tf.keras.layers.Conv1D(32, 3, activation='relu')(x)
    x = tf.keras.layers.UpSampling1D(2)(x)
    x=tf.keras.layers.Conv1D(213, 3, activation='relu', padding='same')(x)
    decoder_output = tf.keras.layers.Flatten()(x)
    

    Please find the gist here. Thank you.