python arrays machine-learning artificial-intelligence autoencoder

How to feed time series data into an autoencoder network for feature extraction?

I am trying to create an autoencoder from scratch for my dataset. It is a variational autoencoder for feature extraction. I am pretty new to machine learning and I would like to know how to feed my input data to the autoencoder.

My data is a time series data. It looks like below:

array([[[  10,   0,   10, ..., 10,   0,   0],
        ...,

        [  0,   12,   32, ...,  2,  2,  2]],

         [[ 0,  3,  7, ...,  7,  3,  0],
        .....
        [ 0,  2,  3, ...,  3,  4,  6]],

       [[1, 3, 1, ..., 0, 10, 2],
        ...,

        [2, 11, 12, ..., 1, 1, 8]]], dtype=int64)

It is a stack of arrays and the shape is (3, 1212, 700). And where do I pass the label?

The examples online are simple and there is no detailed description as to how to feed the data in reality. Any examples or explanations will be highly helpful.

Solution

This can be solved using a generator. The generator takes your time series data of 700 data points each with 3 channels and 1212 time steps and it outputs a batch. In the example I've written the batches are each the same time period, for example batch 0 is the first 10 time steps for each of your 700 samples, batch 1 is the time steps 1:11 for each of your 700 samples. If you want to mix this up in some way then you should edit the generator. The epoch ends when each batch has been tested and trained on. For the neural network a very simple encoder, decoder model can be enough to prove the concept - but you will probably want to replace with your own model. The variable n is used to determine how many time steps are used for the autoencoder.

import numpy as np
import pandas as pd
import keras
from keras.layers import Dense, Flatten
from tensorflow.python.client import device_lib
# check for my gpu 
print(device_lib.list_local_devices())


# make some fake data

# your data
data = np.random.random((3, 1212, 700))

# this is a generator
def image_generator(data, n):
    start = 0
    end = n
    while end < data.shape[1] -1:
        last_n_steps = data[:,start:end].T
        yield (last_n_steps, last_n_steps)
        start +=1
        end +=1
        # the generator MUST loop
        if end == data.shape[1] -1:
            start = 0
            end = n

n = 10
# basic model - replace with your own
encoder_input = Input(shape = (n,3), name = "encoder_input")
fc = Flatten()(encoder_input)
fc = Dense(100, activation='relu',name = "fc1")(fc)
encoder_output = Dense(5, activation='sigmoid',name = "encoder_output")(fc)

encoder = Model(encoder_input,encoder_output)

decoder_input = Input(shape = encoder.layers[-1].output_shape[1:], name = "decoder_input")
fc = Dense(100, activation='relu',name = "fc2")(decoder_input)
output = Dense(5, activation='sigmoid',name = "output")(fc)

decoder = Model(decoder_input,output)

combined_model_input = Input(shape = (n,3), name = "combined_model_input")
autoencoder = Model(combined_model_input, decoder(encoder(combined_model_input)))

model = Model(input_layer,output_layer)
model.compile(optimizer="adam", loss='mean_squared_error')
print(model.summary())

#and training

training_history = model.fit_generator(image_generator(data, n),
                    epochs =5,
                    initial_epoch = 0,
                    steps_per_epoch=data.shape[2]-n,
                    verbose=1
                   )