Search code examples
pythonpandastensorflowmatplotlib

Plotting a trendline with tensorflow neural network


For a project in school, I must read sensor values from a CSV file with four variables (three independent; one target), create a simple neural network based on those values, allow a user to input their own three independent values, then have the program give them a predicted target variable. After all of that, my program must plot a scatter plot of the data, with a trend line of the independent variables against the predicted target value.

Please forgive my terrible formatting and description, I have barely any experience with this site 😭

Optional Boring Context Bit:
My project is focused on predicting mood 'scores' based on the sound level of a room. The sound values are collected by a separate program I have created which utilises sound sensors, then uploads them to a CSV file, which this program reads, and uses to build it's model. The program then asks the user for three predicted sound values, then asks three 'What-if Questions', which find which value is most important. It also gives the user a scatter plot of the sound values, which is where I'm having an issue.

The Important Context:
I have the program running pretty smoothly. It outputs predictions well, compares them, e.t.c. I then added matplotlib scatterplot functionality, to plot each of the three parameters, and a trend line on my plot, which uses a polynomial regression system, since I have a better understanding of sci-kit's polynomial regression function than neural networks. Now I want to take my previous code and, instead of plotting the polynomial regression trend line, plot a trend line which uses my tensorflow neural network.

If it seems like I don't really know what I'm doing with the model, you're correct. I don't have any real experience with things. I know that this platform can be a little unforgiving towards that, but I don't really have a choice. If you choose to help out, thank you.

Here is the previous working code, which finds predicted values using a neural network, but plots the trend line using polynomial regression:

from silence_tensorflow import silence_tensorflow
silence_tensorflow()

# Tensorflow has a bunch of functions available for moddeling and analysis
# Specifically, I'm using the layers module to do the hard work for me in constructing a nueral network.
import tensorflow as tf
from tensorflow.keras import layers

# Pandas is a little more flexible and has more options than the previously used 'CSV' module
# In my case, it's essential for constructing dataframes from my uh... data.
import pandas as pd

# train_test_split shuffles the data
# it then splits the aforementioned data into two samples, one for training the model, another for testing it's accuracy
from sklearn.model_selection import train_test_split

# StandardScaler is pretty self explanatory, it standardises the sample data, so that a clear model can be created
from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Plot that data, baby! (á—’á—Šá—•)
import matplotlib.pyplot as plt
import numpy as np

# Load my CSV values from the serial program
data = pd.read_csv('your_dataset.csv')

# X = Independent variables
X = data[['mean_sound', 'max_sound', 'min_sound']]
# Y = Target variable
Y = data['average_mood']

# Split that data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

# Standardise (ze?) the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Yass queen! Model that neural network! Slay!
model = tf.keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
])

# Compile! (optimize the model and esatblish the loss function, which finds the difference between the predicted and actual data)
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model. Queue 'Eye of The Tiger'!
# Huh, 'queue' is kind of a weird word, right?
model.fit(X_train_scaled, Y_train, epochs=410, validation_split=0.2)

# Evaluate how our new, beefed-up, trained model compares to the OG data
mse = model.evaluate(X_test_scaled, Y_test)
print(f"Test Mean Squared Error: {mse}")

# Function to figure out what the Mean Squared Error says about my program
def mse_rating(mse):
    if mse < 10:
        return "Amazing model accuracy! What a smarty-pants :)"
    elif mse < 20:
        return "Great model accuracy"
    elif mse < 30:
        return "Fine model accuracy."
    elif mse < 40:
        return "Less than average model accuracy."
    else:
        return "Poor model accuracy :("

# Print the model accuracy rating
print(mse_rating(mse))

# Function to plot multiple graphs of each independent variable against mood score
def scatter_2d_plot(X, Y):
    for column in X.columns:
        plt.scatter(X[column], Y, alpha=0.5)
        plt.title(f'Mood Score vs {column}')
        plt.xlabel(column)
        plt.ylabel('Mood Score')
        
        """
        scaler = StandardScaler()
        X_column_scaled = scaler.fit_transform(X[[column]])
        X_column_scaled = (scaler.transform(X_column_scaled)).reshape(1, -1)
        
        # Create a simple neural network with one neuron
        mini_model = tf.keras.Sequential([
            tf.keras.layers.Dense(units=1, input_shape=[1])
        ])
        
        mini_model.compile(optimizer='adam', loss='mean_squared_error')
        
        # Train the model
        mini_model.fit(X_column_scaled, Y, epochs=100, verbose=0)
        
        # Predict Y values based on the trained model
        Y_pred = mini_model.predict(X_column_scaled)
        
        # Plot the trendline
        plt.plot(X_column_scaled, Y_pred, color='red')
        
        # Set the plot limits to focus on the data points
        #plt.xlim(min(X[[column]]), max(X[[column]]))
        #plt.ylim(min(Y), max(Y))
        """
        # Create a PolynomialFeatures object with a specified degree, e.g., 2
        poly = PolynomialFeatures(degree=2)
        
        # Polynomial Transformation
        x_poly = poly.fit_transform(X[column].values.reshape(-1, 1))
        
        # Creating and fitting the model
        model = LinearRegression()
        model.fit(x_poly, Y)

        # Making predictions for the trendline
        x_range = np.linspace(X[column].min(), X[column].max(), 100).reshape(-1, 1)
        y_pred = model.predict(poly.transform(x_range))
        
        plt.plot(x_range, y_pred, color='red')
        
        #plt.savefig("NeuralNetworkOutputChart.png")
        plt.show()

# Have the model predict the mood score based on user input
def make_prediction(user_mean_sound, user_max_sound, user_min_sound):
    try:

        # Create a DataFrame for user input
        user_input_df = pd.DataFrame([[user_mean_sound, user_max_sound, user_min_sound]],
                                     columns=['mean_sound', 'max_sound', 'min_sound'])

        # Standardiseze user input so it compares to the rest of my data
        user_input_scaled = scaler.transform(user_input_df)

        # Predict mood score
        predicted_mood = model.predict(user_input_scaled)[0][0]
        print(f"Predicted Mood Score: {predicted_mood}")
        
        # Send our mood flying back out of my function
        return predicted_mood
        
    except ValueError:
        # It's not me, it's you ╭∩╮(-_-)╭∩╮
        print("Invalid input. Please enter numeric values.")

# Scatterplot of the OG data
scatter_2d_plot(X, Y)

# Ask user for input and make prediction
print("Enter some predicted values for the following")
user_mean_sound = float(input("Average (mean) Sound Level: "))
user_max_sound = float(input("Maximum Sound Level: "))
user_min_sound = float(input("Minimum Sound Level: "))

# Create a point of reference to see if my 'what-if' Qs actually make a positive impact.
# Otherwise, the program will conclude that it's better to leave the sound levels ualtered.
untampered_prediction = make_prediction(user_mean_sound, user_max_sound, user_min_sound)

#---------------------------------------------------

"What-if Questions"

print("\nWhat-if Q1:")
print("What if I double the mean noise level but keep the max and min the same?")
# Multiplies mean sound input by 2
what_if_1 = make_prediction(user_mean_sound*2, user_max_sound, user_min_sound)

print("\nWhat-if Q2:")
print("What if I double the min noise level but keep the mean and min the same?")
# Shockingly, this one multiplies MAXIMUM sound input by 2
what_if_2 = make_prediction(user_mean_sound, user_max_sound*2, user_min_sound)

print("\nWhat-if Q3:")
print("What if I double the min noise level but keep the max and mean the same?")
# In a strange turn of events, this line multiplies minimum sound input by 2!
what_if_3 = make_prediction(user_mean_sound, user_max_sound, user_min_sound*2)


if what_if_1 > untampered_prediction and what_if_1 > what_if_2 and what_if_1 > what_if_3:
    print("\nBased on my questions, it is clear that mean sound levels have the greatest impact on mood!")
    print("Therefore what-if Q1 gives the best result")
    
elif what_if_2 > untampered_prediction and what_if_2 > what_if_1 and what_if_2 > what_if_3:
    print("\nBased on my questions, it is clear that the maximum sound level reached has the greatest impact on mood!")
    print("Therefore what-if Q2 gives the best result")
    
elif what_if_3 > untampered_prediction and what_if_3 > what_if_1 and what_if_3 > what_if_2:
    print("\nBased on my questions, it is clear that the minimum sound level reached has the greatest impact on mood!")
    print("Therefore what-if Q3 gives the best result")
    
elif untampered_prediction > what_if_1 and untampered_prediction > what_if_2 and untampered_prediction > what_if_3:
    print("\nBased on my questions, it seems that none of my what-if questions have a postive impact on a person's mood score")
    print("Therefore the original inputted mean, max, and min sound levels provide the best result")

This is a good stepping stone towards plotting a trend line with the neural interface, and it gives the following output:

Scatter plot SHOULD be here..

Here is my horrible attempt at integrating the neural interface into my plotting (note that I have run this through ChatGPT an unfathomable amount of times, so it might not make since):

from silence_tensorflow import silence_tensorflow
silence_tensorflow()

# Tensorflow has a bunch of functions available for moddeling and analysis
# Specifically, I'm using the layers module to do the hard work for me in constructing a nueral network.
import tensorflow as tf
from tensorflow.keras import layers

# Pandas is a little more flexible and has more options than the previously used 'CSV' module
# In my case, it's essential for constructing dataframes from my uh... data.
import pandas as pd

# train_test_split shuffles the data
# it then splits the aforementioned data into two samples, one for training the model, another for testing it's accuracy
from sklearn.model_selection import train_test_split

# StandardScaler is pretty self explanatory, it standardises the sample data, so that a clear model can be created
from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Plot that data, baby! (á—’á—Šá—•)
import matplotlib.pyplot as plt
import numpy as np

# Load my CSV values from the serial program
data = pd.read_csv('your_dataset.csv')

# X = Independent variables
X = data[['mean_sound', 'max_sound', 'min_sound']]
# Y = Target variable
Y = data['average_mood']

# Split that data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

# Standardise (ze?) the data
scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Yass queen! Model that neural network! Slay!
model = tf.keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
])

# Compile! (optimize the model and esatblish the loss function, which finds the difference between the predicted and actual data)
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model. Queue 'Eye of The Tiger'!
# Huh, 'queue' is kind of a weird word, right?
model.fit(X_train_scaled, Y_train, epochs=410, validation_split=0.2)

# Evaluate how our new, beefed-up, trained model compares to the OG data
mse = model.evaluate(X_test_scaled, Y_test)
print(f"Test Mean Squared Error: {mse}")

# Function to figure out what the Mean Squared Error says about my program
def mse_rating(mse):
    if mse < 10:
        return "Amazing model accuracy! What a smarty-pants :)"
    elif mse < 20:
        return "Great model accuracy"
    elif mse < 30:
        return "Fine model accuracy."
    elif mse < 40:
        return "Less than average model accuracy."
    else:
        return "Poor model accuracy :("

# Print the model accuracy rating
print(mse_rating(mse))

# Function to plot multiple graphs of each independent variable against mood score
def scatter_2d_plot(X, y, x_title, y_title):
    plt.scatter(X[x_title], y, alpha=0.5)
    plt.title(f'Mood Score vs {x_title}')
    plt.xlabel(x_title)
    plt.ylabel('Mood Score')
    
    # Predict mood scores for the range of x values using the neural network model
    x_range = np.linspace(X[x_title].min(), X[x_title].max(), 100).reshape(-1, 1)
    
    # Ensure that the x_range is scaled using the same scaler that was fitted with
    x_range_scaled = scaler.transform(x_range)  

    # Predict mood scores for the scaled x_range
    y_pred = model.predict(x_range_scaled)

    # Plot the trend line
    plt.plot(x_range, y_pred, color='red')
        
    plt.show()



# Have the model predict the mood score based on user input
def make_prediction(user_mean_sound, user_max_sound, user_min_sound):
    try:

        # Create a DataFrame for user input
        user_input_df = pd.DataFrame([[user_mean_sound, user_max_sound, user_min_sound]],
                                     columns=['mean_sound', 'max_sound', 'min_sound'])

        # Standardize user input so it compares to the rest of my data
        user_input_scaled = scaler.transform(user_input_df)

        # Predict mood score
        predicted_mood = model.predict(user_input_scaled)[0][0]
        print(f"Predicted Mood Score: {predicted_mood}")
        
        # Pass the entire DataFrame X_train to scatter_2d_plot
        scatter_2d_plot(X_train, Y_train, 'mean_sound', 'Predicted Mood Score')
        
        # Send our mood flying back out of my function
        return predicted_mood
        
    except ValueError:
        # It's not me, it's you
        print("Invalid input. Please enter numeric values.")



# Ask user for input and make prediction
print("Enter some predicted values for the following")
user_mean_sound = float(input("Average (mean) Sound Level: "))
user_max_sound = float(input("Maximum Sound Level: "))
user_min_sound = float(input("Minimum Sound Level: "))

# Create a point of reference to see if my 'what-if' Qs actually make a positive impact.
# Otherwise, the program will conclude that it's better to leave the sound levels ualtered.
untampered_prediction = make_prediction(user_mean_sound, user_max_sound, user_min_sound)

#---------------------------------------------------

# What-if Questions

print("\nWhat-if Q1:")
print("What if I double the mean noise level but keep the max and min the same?")
# Multiplies mean sound input by 2
what_if_1 = make_prediction(user_mean_sound*2, user_max_sound, user_min_sound)

print("\nWhat-if Q2:")
print("What if I double the min noise level but keep the mean and min the same?")
# Shockingly, this one multiplies MAXIMUM sound input by 2
what_if_2 = make_prediction(user_mean_sound, user_max_sound*2, user_min_sound)

print("\nWhat-if Q3:")
print("What if I double the min noise level but keep the max and mean the same?")
# In a strange turn of events, this line multiplies minimum sound input by 2!
what_if_3 = make_prediction(user_mean_sound, user_max_sound, user_min_sound*2)


# Validate predictions before comparing them
if all(pred is not None for pred in [untampered_prediction, what_if_1, what_if_2, what_if_3]):
    if what_if_1 > untampered_prediction and what_if_1 > what_if_2 and what_if_1 > what_if_3:
        print("\nBased on my questions, it is clear that mean sound levels have the greatest impact on mood!")
        print("Therefore what-if Q1 gives the best result")
        
    elif what_if_2 > untampered_prediction and what_if_2 > what_if_1 and what_if_2 > what_if_3:
        print("\nBased on my questions, it is clear that the maximum sound level reached has the greatest impact on mood!")
        print("Therefore what-if Q2 gives the best result")
        
    elif what_if_3 > untampered_prediction and what_if_3 > what_if_1 and what_if_3 > what_if_2:
        print("\nBased on my questions, it is clear that the minimum sound level reached has the greatest impact on mood!")
        print("Therefore what-if Q3 gives the best result")
        
    elif untampered_prediction > what_if_1 and untampered_prediction > what_if_2 and untampered_prediction > what_if_3:
        print("\nBased on my questions, it seems that none of my what-if questions have a positive impact on a person's mood score")
        print("Therefore the original inputted mean, max, and min sound levels provide the best result")
else:
    print("At least one prediction failed. Please check your input values.")

The second block of code gives the following error:

C:\Users\Nitro\AppData\Roaming\Python\Python310\site-packages\sklearn\base.py:465: UserWarning: X does not have valid feature names, but StandardScaler was fitted with feature names warnings.warn( Invalid input. Please enter numeric values. At least one prediction failed. Please check your input values.

I know that this was a long question, I'm not really sure how much detail people usually give in these things. Literally any help is appreciated, and if you choose to help a fish out of water like me out of the goodness of your heart, I thank you in advance!


Solution

  • In your previous working code, adding the function scatter_2d_plot_withTFmodel() defined below will do the required plotting. I tested it with some mock data:

    enter image description here

    The net is trained on 3 features, and therefore it needs to see a value for each of the 3 features any time you use predict(). When plotting a trend line, you only care about one column at a time, so you need to come up with something for the other two columns, otherwise predict() will complain about not having a valid value for each input.

    In the code I used your x_range for the column being plotted, and for the other 2 columns I used their average value to fill their column. That gives us the required 3 inputs for predict().

    There is a prediction for each point on x_range, under the assumption that the other features have an average value.

    def scatter_2d_plot_withTFmodel(X, Y):
        for column in X.columns:
            plt.scatter(X[column], Y, alpha=0.5)
            plt.title(f'Mood Score vs {column} (with TF model trendline)')
            plt.xlabel(column)
            plt.ylabel('Mood Score')
    
            #We only want the trendline for "column" here,
            # but the TF model needs all 3 inputs (mean_sound, max_sound, min_sound).
            #
            #To get round this, we need to provide values for the other 2 columns.
            #
            #1. Find out which columns aren't selected.
            #2. Then get their average.
            #3. Make a new dataframe that is populated with
            #   "x_range" for "column", and the average for the other two columns.
            #4. Transform using the previously-fitted scaler, and predict.
            #
            #The prediction being made for "column" is therefore
            # making the assumption that we have an average amount of the other columns.
            
            #1. Find the unselected columns
            other_columns = [col for col in X.columns if col != column]
            #2. Get their average
            other_columns_average = X[other_columns].mean().values
            
            #3. Make a new dataframe - populate it with the values to use.
            
            # x_range for this column
            x_range = np.linspace(X[column].min(), X[column].max(), 100)
            
            df_for_trendline = pd.DataFrame({
                column: x_range,
                other_columns[0]: [other_columns_average[0]] * len(x_range),
                other_columns[1]: [other_columns_average[1]] * len(x_range),
            })
            # Reorder the columns so they match the order when "scaler" was fitted
            df_for_trendline = df_for_trendline[['mean_sound', 'max_sound', 'min_sound']]
            
            #4. Transform using previous scaler, and predict.
            # Prediction is for x_range, assuming an average value
            # for the other variables
            df_for_trendline_scaled = scaler.transform(df_for_trendline)
            y_pred = model.predict(df_for_trendline_scaled)
            
            plt.plot(x_range, y_pred, color='red')
            
            #plt.savefig("NeuralNetworkOutputChart.png")
            plt.show()