Search code examples
pythonarraysmachine-learningartificial-intelligenceautoencoder

How to feed time series data into an autoencoder network for feature extraction?


I am trying to create an autoencoder from scratch for my dataset. It is a variational autoencoder for feature extraction. I am pretty new to machine learning and I would like to know how to feed my input data to the autoencoder.

My data is a time series data. It looks like below:

array([[[  10,   0,   10, ..., 10,   0,   0],
        ...,

        [  0,   12,   32, ...,  2,  2,  2]],

         [[ 0,  3,  7, ...,  7,  3,  0],
        .....
        [ 0,  2,  3, ...,  3,  4,  6]],

       [[1, 3, 1, ..., 0, 10, 2],
        ...,

        [2, 11, 12, ..., 1, 1, 8]]], dtype=int64)

It is a stack of arrays and the shape is (3, 1212, 700). And where do I pass the label?

The examples online are simple and there is no detailed description as to how to feed the data in reality. Any examples or explanations will be highly helpful.


Solution

  • This can be solved using a generator. The generator takes your time series data of 700 data points each with 3 channels and 1212 time steps and it outputs a batch. In the example I've written the batches are each the same time period, for example batch 0 is the first 10 time steps for each of your 700 samples, batch 1 is the time steps 1:11 for each of your 700 samples. If you want to mix this up in some way then you should edit the generator. The epoch ends when each batch has been tested and trained on. For the neural network a very simple encoder, decoder model can be enough to prove the concept - but you will probably want to replace with your own model. The variable n is used to determine how many time steps are used for the autoencoder.

    import numpy as np
    import pandas as pd
    import keras
    from keras.layers import Dense, Flatten
    from tensorflow.python.client import device_lib
    # check for my gpu 
    print(device_lib.list_local_devices())
    
    
    # make some fake data
    
    # your data
    data = np.random.random((3, 1212, 700))
    
    # this is a generator
    def image_generator(data, n):
        start = 0
        end = n
        while end < data.shape[1] -1:
            last_n_steps = data[:,start:end].T
            yield (last_n_steps, last_n_steps)
            start +=1
            end +=1
            # the generator MUST loop
            if end == data.shape[1] -1:
                start = 0
                end = n
    
    n = 10
    # basic model - replace with your own
    encoder_input = Input(shape = (n,3), name = "encoder_input")
    fc = Flatten()(encoder_input)
    fc = Dense(100, activation='relu',name = "fc1")(fc)
    encoder_output = Dense(5, activation='sigmoid',name = "encoder_output")(fc)
    
    encoder = Model(encoder_input,encoder_output)
    
    decoder_input = Input(shape = encoder.layers[-1].output_shape[1:], name = "decoder_input")
    fc = Dense(100, activation='relu',name = "fc2")(decoder_input)
    output = Dense(5, activation='sigmoid',name = "output")(fc)
    
    decoder = Model(decoder_input,output)
    
    combined_model_input = Input(shape = (n,3), name = "combined_model_input")
    autoencoder = Model(combined_model_input, decoder(encoder(combined_model_input)))
    
    model = Model(input_layer,output_layer)
    model.compile(optimizer="adam", loss='mean_squared_error')
    print(model.summary())
    
    #and training
    
    training_history = model.fit_generator(image_generator(data, n),
                        epochs =5,
                        initial_epoch = 0,
                        steps_per_epoch=data.shape[2]-n,
                        verbose=1
                       )