Search code examples
pythontensorflowmachine-learningkeraslstm

TensorFlow LSTM predicting same value


What I want to do is input a list of numbers to my LSTM model, and have my LSTM model output its own list of numbers. My project is a program that takes an online MIDI file, converts it into a list of numbers, gets a new list of numbers from the LSTM, change those new numbers into MIDI, and then listen to the file. The place where I am running into an issue is where I get a new list of numbers from the LSTM model.

Here is the main code that I currently have:

from midi_to_text import data_parse
from split_sequence import split_sequence
import py_midicsv as pm
import math
from numpy import asarray
from tensorflow.keras import Sequential
from tensorflow.keras.layers import *
import tensorflow as tf


raw_midi = pm.midi_to_csv('OnlineMidi.mid')
data = data_parse(raw_midi)

n_steps = 1
X, y = split_sequence(data, n_steps)
X = X.reshape((X.shape[0], X.shape[1], 1))
X = tf.cast(X, dtype='float32')

model = Sequential()
model.add(LSTM(256, activation='sigmoid', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation='sigmoid', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

model.fit(X, y, epochs=100, batch_size=32, verbose=2)

notes = [64]
song_length = 10

for i in range(song_length):
    prediction = model.predict(asarray(notes).reshape((-1, 1, 1)))
    prediction[0][0] = (prediction[0][0] * 384) - (prediction[0][0] * 13) + 13
    # Turns float from 0 to 1 back into integer
    notes.append(prediction[0][0])

print(notes)

Here is my function for creating the training set and labels:

from numpy import asarray


def split_sequence(data, n_steps):
    new_data, expected_values = list(), list()
    for i in range(len(data)):
        if n_steps + i <= len(data) - 1:
            new_data.append(data[i:n_steps + i])
            expected_values.append(data[n_steps + i])
        else:
            break

    for i in new_data:
        i[0] = (i[0] - 13) / (384 - 13)

    for i in range(len(expected_values)):
        expected_values[i] = (expected_values[i] - 13) / (384 - 13)
    # Turns values into float between 0 and 1
    return asarray(new_data), asarray(expected_values)

This is the x training data when n_steps = 1:

[[64], [76], [64], [75], [64], [76], [64], [75], [64], [76], [64], [71], [64], [74], [64], [72], [69], [64], [45], [64], [52], [64], [57], [64], [60], [64]]

This is the labels when n_steps = 1:

[76, 64, 75, 64, 76, 64, 75, 64, 76, 64, 71, 64, 74, 64, 72, 69, 64, 45, 64, 52, 64, 57, 64, 60, 64, 64, 64, 69, 71, 64, 40, 64, 52, 64, 56, 64, 64, 64,]

This is my data:

[64, 76, 64, 75, 64, 76, 64, 75, 64, 76, 64, 71, 64, 74, 64, 72, 69, 64, 45, 64, 52, 64, 57, 64, 60, 64, 64, 64]

This is what my model is currently outputting, a list of 9 predictions starting with the seed 64:

[64, 62.63686, 62.636864, 62.636864, 62.636864, 62.636864, 62.636864, 62.636864, 62.636864, 62.636864, 62.636864]

What I do not understand is why these predictions are all basically the same. When I print the prediction in the last for loop in my main code, I get an output of a list with x lists inside where x is the number of input data. Here is an example of one of these predictions:

[[62.500393]
 [62.500393]
 [62.500393]
 [62.500393]
 [62.500393]
 [62.500393]
 [62.500393]
 [62.500393]
 [62.500393]
 [62.500393]]

This is why in that for loop I just take the first list's value in the list as the prediction. To recap, I have a program that takes a list of numbers, and I want to have an LSTM model output a list of prediction numbers starting with the seed 64. The issue I am running into is that my model is, for some reason, outputting basically the same prediction every time, so I need help on this prediction process.

**UPDATE: ** I tried putting the model.fit() and model.predict() in a for loop and just loop over that 10 times to see what happened. Good news: each prediction was different than the last and that is good. Bad news: It is very slow and I am not sure if this is the best way to go about this. Any advice for getting these values closer to expected values or if this method is even good? It seems highly ineffecient because I am retraining the model 10 times just for 10 output notes (its actually 5, the other 5 values are the duration for each note).

Here is my new output using this for loop:

[64, 56.53626, 58.395187, 61.333992, 59.08212, 58.66997, 55.86058, 59.819744, 54.183216, 55.231224, 53.8824]

Here is my new code, it is the same things just with a big for loop:

from midi_to_text import data_parse
from split_sequence import split_sequence
import py_midicsv as pm
import math
from numpy import asarray
from tensorflow.keras import Sequential
from tensorflow.keras.layers import *
import tensorflow as tf


raw_midi = pm.midi_to_csv('OnlineMidi.mid')
data = data_parse(raw_midi)

n_steps = 1
X, y = split_sequence(data, n_steps)
print(X)
print(y)
X = X.reshape((X.shape[0], X.shape[1], 1))
X = tf.cast(X, dtype='float32')

notes = [64]

model = Sequential()
model.add(LSTM(256, activation='linear', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation='linear', return_sequences=True))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

for i in range(10):
    model.fit(X, y, epochs=5, batch_size=2, verbose=2)

    prediction = model.predict(asarray(notes).reshape((-1, 1, 1)))
    prediction[0][0] = (prediction[0][0] * 384) - (prediction[0][0] * 13) + 13
    notes.append(prediction[0][0])

print(notes)

Custom midi_to_text data parser:

def data_parse(raw_midi):
    temp = []
    final = []
    to_remove = []
    shift_unit = 20

    for i in range(len(raw_midi)):
        temp.append(raw_midi[i].split(', '))

    for i in range(len(temp)):
        if temp[i][2] != 'Note_on_c':
            to_remove.append(temp[i])
    
    for i in to_remove:
        temp.remove(i)
    
    for i in temp:
        i.remove(i[0])
        i.remove(i[1])
        i.remove(i[1])
        i.remove(i[2])

    for i in range(len(temp)):
        if i == len(temp) - 1:
            temp[i][0] = '64'
        else:
            temp[i][0] = str(int(temp[i + 1][0]) - int(temp[i][0]))
            
    to_remove.clear()
    
    for i in range(len(temp)):
        if i == len(temp) - 1:
            break
        if temp[i + 1][0] == '0':
            temp[i].append(temp[i + 1][1])
            to_remove.append(temp[i + 1])
    
    for i in to_remove:
        temp.remove(i)

    for i in temp:
        for _ in i:
            final.append(int(_))

    return final

THANKS!!


Solution

  • My conclusion is that although highly inefficient, to just put the model.fit and predict into a for loop to predict 1 step into the future or generate 1 piece of information at a time. This means that yes, you do have to fit the model a bunch of times, feeding it your previous data that it generated, but that is something that I could sacrifice. This method does work, just takes some time, and is the only main solution that I have found. Thanks to everyone who responded, making all of the steps really clear to me, hopefully this question helps someone else out there!