Plotting a trendline with tensorflow neural network

For a project in school, I must read sensor values from a CSV file with four variables (three independent; one target), create a simple neural network based on those values, allow a user to input their own three independent values, then have the program give them a predicted target variable. After all of that, my program must plot a scatter plot of the data, with a trend line of the independent variables against the predicted target value.

Optional Boring Context Bit:
My project is focused on predicting mood 'scores' based on the sound level of a room. The sound values are collected by a separate program I have created which utilises sound sensors, then uploads them to a CSV file, which this program reads, and uses to build it's model. The program then asks the user for three predicted sound values, then asks three 'What-if Questions', which find which value is most important. It also gives the user a scatter plot of the sound values, which is where I'm having an issue.

The Important Context:
I have the program running pretty smoothly. It outputs predictions well, compares them, e.t.c. I then added matplotlib scatterplot functionality, to plot each of the three parameters, and a trend line on my plot, which uses a polynomial regression system, since I have a better understanding of sci-kit's polynomial regression function than neural networks. Now I want to take my previous code and, instead of plotting the polynomial regression trend line, plot a trend line which uses my tensorflow neural network.

If it seems like I don't really know what I'm doing with the model, you're correct. I don't have any real experience with things. If you choose to help out, thank you.

Here is the previous working code, which finds predicted values using a neural network, but plots the trend line using polynomial regression:

from silence_tensorflow import silence_tensorflow

# Tensorflow has a bunch of functions available for moddeling and analysis
# Specifically, I'm using the layers module to do the hard work for me in constructing a nueral network.
import tensorflow as tf
from tensorflow.keras import layers

# Pandas is a little more flexible and has more options than the previously used 'CSV' module
# In my case, it's essential for constructing dataframes from my uh... data.
import pandas as pd

# train_test_split shuffles the data
# it then splits the aforementioned data into two samples, one for training the model, another for testing it's accuracy
from sklearn.model_selection import train_test_split

# StandardScaler is pretty self explanatory, it standardises the sample data, so that a clear model can be created
from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Plot that data, baby! (á—’á—Šá—•)
import matplotlib.pyplot as plt
import numpy as np

# Load my CSV values from the serial program
data = pd.read_csv('your_dataset.csv')

# X = Independent variables
X = data[['mean_sound', 'max_sound', 'min_sound']]
# Y = Target variable
Y = data['average_mood']

# Split that data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

# Standardise (ze?) the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Yass queen! Model that neural network! Slay!
model = tf.keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(64, activation='relu'),

# Compile! (optimize the model and esatblish the loss function, which finds the difference between the predicted and actual data)
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model. Queue 'Eye of The Tiger'!
# Huh, 'queue' is kind of a weird word, right?, Y_train, epochs=410, validation_split=0.2)

# Evaluate how our new, beefed-up, trained model compares to the OG data
mse = model.evaluate(X_test_scaled, Y_test)
print(f"Test Mean Squared Error: {mse}")

# Function to figure out what the Mean Squared Error says about my program
def mse_rating(mse):
    if mse < 10:
        return "Amazing model accuracy! What a smarty-pants :)"
    elif mse < 20:
        return "Great model accuracy"
    elif mse < 30:
        return "Fine model accuracy."
    elif mse < 40:
        return "Less than average model accuracy."
        return "Poor model accuracy :("

# Print the model accuracy rating

# Function to plot multiple graphs of each independent variable against mood score
def scatter_2d_plot(X, Y):
    for column in X.columns:
        plt.scatter(X[column], Y, alpha=0.5)
        plt.title(f'Mood Score vs {column}')
        plt.ylabel('Mood Score')
        scaler = StandardScaler()
        X_column_scaled = scaler.fit_transform(X[[column]])
        X_column_scaled = (scaler.transform(X_column_scaled)).reshape(1, -1)
        # Create a simple neural network with one neuron
        mini_model = tf.keras.Sequential([
            tf.keras.layers.Dense(units=1, input_shape=[1])
        mini_model.compile(optimizer='adam', loss='mean_squared_error')
        # Train the model, Y, epochs=100, verbose=0)
        # Predict Y values based on the trained model
        Y_pred = mini_model.predict(X_column_scaled)
        # Plot the trendline
        plt.plot(X_column_scaled, Y_pred, color='red')
        # Set the plot limits to focus on the data points
        #plt.xlim(min(X[[column]]), max(X[[column]]))
        #plt.ylim(min(Y), max(Y))
        # Create a PolynomialFeatures object with a specified degree, e.g., 2
        poly = PolynomialFeatures(degree=2)
        # Polynomial Transformation
        x_poly = poly.fit_transform(X[column].values.reshape(-1, 1))
        # Creating and fitting the model
        model = LinearRegression(), Y)

        # Making predictions for the trendline
        x_range = np.linspace(X[column].min(), X[column].max(), 100).reshape(-1, 1)
        y_pred = model.predict(poly.transform(x_range))
        plt.plot(x_range, y_pred, color='red')

# Have the model predict the mood score based on user input
def make_prediction(user_mean_sound, user_max_sound, user_min_sound):

        # Create a DataFrame for user input
        user_input_df = pd.DataFrame([[user_mean_sound, user_max_sound, user_min_sound]],
                                     columns=['mean_sound', 'max_sound', 'min_sound'])

        # Standardiseze user input so it compares to the rest of my data
        user_input_scaled = scaler.transform(user_input_df)

        # Predict mood score
        predicted_mood = model.predict(user_input_scaled)[0][0]
        print(f"Predicted Mood Score: {predicted_mood}")
        # Send our mood flying back out of my function
        return predicted_mood
    except ValueError:
        # It's not me, it's you ╭∩╮(-_-)╭∩╮
        print("Invalid input. Please enter numeric values.")

# Scatterplot of the OG data
scatter_2d_plot(X, Y)

# Ask user for input and make prediction
print("Enter some predicted values for the following")
user_mean_sound = float(input("Average (mean) Sound Level: "))
user_max_sound = float(input("Maximum Sound Level: "))
user_min_sound = float(input("Minimum Sound Level: "))

# Create a point of reference to see if my 'what-if' Qs actually make a positive impact.
# Otherwise, the program will conclude that it's better to leave the sound levels ualtered.
untampered_prediction = make_prediction(user_mean_sound, user_max_sound, user_min_sound)


print("\nWhat-if Q1:")
print("What if I double the mean noise level but keep the max and min the same?")
# Multiplies mean sound input by 2
what_if_1 = make_prediction(user_mean_sound*2, user_max_sound, user_min_sound)

print("\nWhat-if Q2:")
print("What if I double the min noise level but keep the mean and min the same?")
# Shockingly, this one multiplies MAXIMUM sound input by 2
what_if_2 = make_prediction(user_mean_sound, user_max_sound*2, user_min_sound)

print("\nWhat-if Q3:")
print("What if I double the min noise level but keep the max and mean the same?")
# In a strange turn of events, this line multiplies minimum sound input by 2!
what_if_3 = make_prediction(user_mean_sound, user_max_sound, user_min_sound*2)

if what_if_1 > untampered_prediction and what_if_1 > what_if_2 and what_if_1 > what_if_3:
    print("\nBased on my questions, it is clear that mean sound levels have the greatest impact on mood!")
    print("Therefore what-if Q1 gives the best result")
elif what_if_2 > untampered_prediction and what_if_2 > what_if_1 and what_if_2 > what_if_3:
    print("\nBased on my questions, it is clear that the maximum sound level reached has the greatest impact on mood!")
    print("Therefore what-if Q2 gives the best result")
elif what_if_3 > untampered_prediction and what_if_3 > what_if_1 and what_if_3 > what_if_2:
    print("\nBased on my questions, it is clear that the minimum sound level reached has the greatest impact on mood!")
    print("Therefore what-if Q3 gives the best result")
elif untampered_prediction > what_if_1 and untampered_prediction > what_if_2 and untampered_prediction > what_if_3:
    print("\nBased on my questions, it seems that none of my what-if questions have a postive impact on a person's mood score")
    print("Therefore the original inputted mean, max, and min sound levels provide the best result")

This is a good stepping stone towards plotting a trend line with the neural interface, and it gives the following output:

Scatter plot SHOULD be here..

Here is my horrible attempt at integrating the neural interface into my plotting (note that I have run this through ChatGPT an unfathomable amount of times, so it might not make since):

from silence_tensorflow import silence_tensorflow

# Tensorflow has a bunch of functions available for moddeling and analysis
# Specifically, I'm using the layers module to do the hard work for me in constructing a nueral network.
import tensorflow as tf
from tensorflow.keras import layers

# Pandas is a little more flexible and has more options than the previously used 'CSV' module
# In my case, it's essential for constructing dataframes from my uh... data.
import pandas as pd

# train_test_split shuffles the data
# it then splits the aforementioned data into two samples, one for training the model, another for testing it's accuracy
from sklearn.model_selection import train_test_split

# StandardScaler is pretty self explanatory, it standardises the sample data, so that a clear model can be created
from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Plot that data, baby! (á—’á—Šá—•)
import matplotlib.pyplot as plt
import numpy as np

# Load my CSV values from the serial program
data = pd.read_csv('your_dataset.csv')

# X = Independent variables
X = data[['mean_sound', 'max_sound', 'min_sound']]
# Y = Target variable
Y = data['average_mood']

# Split that data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

# Standardise (ze?) the data
scaler = StandardScaler()
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Yass queen! Model that neural network! Slay!
model = tf.keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(64, activation='relu'),

# Compile! (optimize the model and esatblish the loss function, which finds the difference between the predicted and actual data)
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model. Queue 'Eye of The Tiger'!
# Huh, 'queue' is kind of a weird word, right?, Y_train, epochs=410, validation_split=0.2)

# Evaluate how our new, beefed-up, trained model compares to the OG data
mse = model.evaluate(X_test_scaled, Y_test)
print(f"Test Mean Squared Error: {mse}")

# Function to figure out what the Mean Squared Error says about my program
def mse_rating(mse):
    if mse < 10:
        return "Amazing model accuracy! What a smarty-pants :)"
    elif mse < 20:
        return "Great model accuracy"
    elif mse < 30:
        return "Fine model accuracy."
    elif mse < 40:
        return "Less than average model accuracy."
        return "Poor model accuracy :("

# Print the model accuracy rating

# Function to plot multiple graphs of each independent variable against mood score
def scatter_2d_plot(X, y, x_title, y_title):
    plt.scatter(X[x_title], y, alpha=0.5)
    plt.title(f'Mood Score vs {x_title}')
    plt.ylabel('Mood Score')
    # Predict mood scores for the range of x values using the neural network model
    x_range = np.linspace(X[x_title].min(), X[x_title].max(), 100).reshape(-1, 1)
    # Ensure that the x_range is scaled using the same scaler that was fitted with
    x_range_scaled = scaler.transform(x_range)  

    # Predict mood scores for the scaled x_range
    y_pred = model.predict(x_range_scaled)

    # Plot the trend line
    plt.plot(x_range, y_pred, color='red')

# Have the model predict the mood score based on user input
def make_prediction(user_mean_sound, user_max_sound, user_min_sound):

        # Create a DataFrame for user input
        user_input_df = pd.DataFrame([[user_mean_sound, user_max_sound, user_min_sound]],
                                     columns=['mean_sound', 'max_sound', 'min_sound'])

        # Standardize user input so it compares to the rest of my data
        user_input_scaled = scaler.transform(user_input_df)

        # Predict mood score
        predicted_mood = model.predict(user_input_scaled)[0][0]
        print(f"Predicted Mood Score: {predicted_mood}")
        # Pass the entire DataFrame X_train to scatter_2d_plot
        scatter_2d_plot(X_train, Y_train, 'mean_sound', 'Predicted Mood Score')
        # Send our mood flying back out of my function
        return predicted_mood
    except ValueError:
        # It's not me, it's you
        print("Invalid input. Please enter numeric values.")

# Ask user for input and make prediction
print("Enter some predicted values for the following")
user_mean_sound = float(input("Average (mean) Sound Level: "))
user_max_sound = float(input("Maximum Sound Level: "))
user_min_sound = float(input("Minimum Sound Level: "))

# Create a point of reference to see if my 'what-if' Qs actually make a positive impact.
# Otherwise, the program will conclude that it's better to leave the sound levels ualtered.
untampered_prediction = make_prediction(user_mean_sound, user_max_sound, user_min_sound)


# What-if Questions

print("\nWhat-if Q1:")
print("What if I double the mean noise level but keep the max and min the same?")
# Multiplies mean sound input by 2
what_if_1 = make_prediction(user_mean_sound*2, user_max_sound, user_min_sound)

print("\nWhat-if Q2:")
print("What if I double the min noise level but keep the mean and min the same?")
# Shockingly, this one multiplies MAXIMUM sound input by 2
what_if_2 = make_prediction(user_mean_sound, user_max_sound*2, user_min_sound)

print("\nWhat-if Q3:")
print("What if I double the min noise level but keep the max and mean the same?")
# In a strange turn of events, this line multiplies minimum sound input by 2!
what_if_3 = make_prediction(user_mean_sound, user_max_sound, user_min_sound*2)

# Validate predictions before comparing them
if all(pred is not None for pred in [untampered_prediction, what_if_1, what_if_2, what_if_3]):
    if what_if_1 > untampered_prediction and what_if_1 > what_if_2 and what_if_1 > what_if_3:
        print("\nBased on my questions, it is clear that mean sound levels have the greatest impact on mood!")
        print("Therefore what-if Q1 gives the best result")
    elif what_if_2 > untampered_prediction and what_if_2 > what_if_1 and what_if_2 > what_if_3:
        print("\nBased on my questions, it is clear that the maximum sound level reached has the greatest impact on mood!")
        print("Therefore what-if Q2 gives the best result")
    elif what_if_3 > untampered_prediction and what_if_3 > what_if_1 and what_if_3 > what_if_2:
        print("\nBased on my questions, it is clear that the minimum sound level reached has the greatest impact on mood!")
        print("Therefore what-if Q3 gives the best result")
    elif untampered_prediction > what_if_1 and untampered_prediction > what_if_2 and untampered_prediction > what_if_3:
        print("\nBased on my questions, it seems that none of my what-if questions have a positive impact on a person's mood score")
        print("Therefore the original inputted mean, max, and min sound levels provide the best result")
    print("At least one prediction failed. Please check your input values.")

The second block of code gives the following error:

C:\Users\Nitro\AppData\Roaming\Python\Python310\site-packages\sklearn\ UserWarning: X does not have valid feature names, but StandardScaler was fitted with feature names warnings.warn( Invalid input. Please enter numeric values. At least one prediction failed. Please check your input values.

I know that this was a long question, I'm not really sure how much detail people usually give in these things. Literally any help is appreciated, and if you choose to help a fish out of water like me out of the goodness of your heart, I thank you in advance!


  • In your previous working code, adding the function scatter_2d_plot_withTFmodel() defined below will do the required plotting. I tested it with some mock data:

    The net is trained on 3 features, and therefore it needs to see a value for each of the 3 features any time you use predict(). When plotting a trend line, you only care about one column at a time, so you need to come up with something for the other two columns, otherwise predict() will complain about not having a valid value for each input.

    In the code I used your x_range for the column being plotted, and for the other 2 columns I used their average value to fill their column. That gives us the required 3 inputs for predict().

    There is a prediction for each point on x_range, under the assumption that the other features have an average value.

    def scatter_2d_plot_withTFmodel(X, Y):
        for column in X.columns:
            plt.scatter(X[column], Y, alpha=0.5)
            plt.title(f'Mood Score vs {column} (with TF model trendline)')
            plt.ylabel('Mood Score')
            #We only want the trendline for "column" here,
            # but the TF model needs all 3 inputs (mean_sound, max_sound, min_sound).
            #To get round this, we need to provide values for the other 2 columns.
            #1. Find out which columns aren't selected.
            #2. Then get their average.
            #3. Make a new dataframe that is populated with
            #   "x_range" for "column", and the average for the other two columns.
            #4. Transform using the previously-fitted scaler, and predict.
            #The prediction being made for "column" is therefore
            # making the assumption that we have an average amount of the other columns.
            #1. Find the unselected columns
            other_columns = [col for col in X.columns if col != column]
            #2. Get their average
            other_columns_average = X[other_columns].mean().values
            #3. Make a new dataframe - populate it with the values to use.
            # x_range for this column
            x_range = np.linspace(X[column].min(), X[column].max(), 100)
            df_for_trendline = pd.DataFrame({
                column: x_range,
                other_columns[0]: [other_columns_average[0]] * len(x_range),
                other_columns[1]: [other_columns_average[1]] * len(x_range),
            # Reorder the columns so they match the order when "scaler" was fitted
            df_for_trendline = df_for_trendline[['mean_sound', 'max_sound', 'min_sound']]
            #4. Transform using previous scaler, and predict.
            # Prediction is for x_range, assuming an average value
            # for the other variables
            df_for_trendline_scaled = scaler.transform(df_for_trendline)
            y_pred = model.predict(df_for_trendline_scaled)
            plt.plot(x_range, y_pred, color='red')