For a project in school, I must read sensor values from a CSV file with four variables (three independent; one target), create a simple neural network based on those values, allow a user to input their own three independent values, then have the program give them a predicted target variable. After all of that, my program must plot a scatter plot of the data, with a trend line of the independent variables against the predicted target value.
Please forgive my terrible formatting and description, I have barely any experience with this site ðŸ˜
Optional Boring Context Bit:
My project is focused on predicting mood 'scores' based on the sound level of a room. The sound values are collected by a separate program I have created which utilises sound sensors, then uploads them to a CSV file, which this program reads, and uses to build it's model. The program then asks the user for three predicted sound values, then asks three 'What-if Questions', which find which value is most important. It also gives the user a scatter plot of the sound values, which is where I'm having an issue.
The Important Context:
I have the program running pretty smoothly. It outputs predictions well, compares them, e.t.c. I then added matplotlib scatterplot functionality, to plot each of the three parameters, and a trend line on my plot, which uses a polynomial regression system, since I have a better understanding of sci-kit's polynomial regression function than neural networks. Now I want to take my previous code and, instead of plotting the polynomial regression trend line, plot a trend line which uses my tensorflow neural network.
If it seems like I don't really know what I'm doing with the model, you're correct. I don't have any real experience with things. I know that this platform can be a little unforgiving towards that, but I don't really have a choice. If you choose to help out, thank you.
Here is the previous working code, which finds predicted values using a neural network, but plots the trend line using polynomial regression:
from silence_tensorflow import silence_tensorflow
silence_tensorflow()
# Tensorflow has a bunch of functions available for moddeling and analysis
# Specifically, I'm using the layers module to do the hard work for me in constructing a nueral network.
import tensorflow as tf
from tensorflow.keras import layers
# Pandas is a little more flexible and has more options than the previously used 'CSV' module
# In my case, it's essential for constructing dataframes from my uh... data.
import pandas as pd
# train_test_split shuffles the data
# it then splits the aforementioned data into two samples, one for training the model, another for testing it's accuracy
from sklearn.model_selection import train_test_split
# StandardScaler is pretty self explanatory, it standardises the sample data, so that a clear model can be created
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Plot that data, baby! (á—’á—Šá—•)
import matplotlib.pyplot as plt
import numpy as np
# Load my CSV values from the serial program
data = pd.read_csv('your_dataset.csv')
# X = Independent variables
X = data[['mean_sound', 'max_sound', 'min_sound']]
# Y = Target variable
Y = data['average_mood']
# Split that data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
# Standardise (ze?) the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Yass queen! Model that neural network! Slay!
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
# Compile! (optimize the model and esatblish the loss function, which finds the difference between the predicted and actual data)
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model. Queue 'Eye of The Tiger'!
# Huh, 'queue' is kind of a weird word, right?
model.fit(X_train_scaled, Y_train, epochs=410, validation_split=0.2)
# Evaluate how our new, beefed-up, trained model compares to the OG data
mse = model.evaluate(X_test_scaled, Y_test)
print(f"Test Mean Squared Error: {mse}")
# Function to figure out what the Mean Squared Error says about my program
def mse_rating(mse):
if mse < 10:
return "Amazing model accuracy! What a smarty-pants :)"
elif mse < 20:
return "Great model accuracy"
elif mse < 30:
return "Fine model accuracy."
elif mse < 40:
return "Less than average model accuracy."
else:
return "Poor model accuracy :("
# Print the model accuracy rating
print(mse_rating(mse))
# Function to plot multiple graphs of each independent variable against mood score
def scatter_2d_plot(X, Y):
for column in X.columns:
plt.scatter(X[column], Y, alpha=0.5)
plt.title(f'Mood Score vs {column}')
plt.xlabel(column)
plt.ylabel('Mood Score')
"""
scaler = StandardScaler()
X_column_scaled = scaler.fit_transform(X[[column]])
X_column_scaled = (scaler.transform(X_column_scaled)).reshape(1, -1)
# Create a simple neural network with one neuron
mini_model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1])
])
mini_model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model
mini_model.fit(X_column_scaled, Y, epochs=100, verbose=0)
# Predict Y values based on the trained model
Y_pred = mini_model.predict(X_column_scaled)
# Plot the trendline
plt.plot(X_column_scaled, Y_pred, color='red')
# Set the plot limits to focus on the data points
#plt.xlim(min(X[[column]]), max(X[[column]]))
#plt.ylim(min(Y), max(Y))
"""
# Create a PolynomialFeatures object with a specified degree, e.g., 2
poly = PolynomialFeatures(degree=2)
# Polynomial Transformation
x_poly = poly.fit_transform(X[column].values.reshape(-1, 1))
# Creating and fitting the model
model = LinearRegression()
model.fit(x_poly, Y)
# Making predictions for the trendline
x_range = np.linspace(X[column].min(), X[column].max(), 100).reshape(-1, 1)
y_pred = model.predict(poly.transform(x_range))
plt.plot(x_range, y_pred, color='red')
#plt.savefig("NeuralNetworkOutputChart.png")
plt.show()
# Have the model predict the mood score based on user input
def make_prediction(user_mean_sound, user_max_sound, user_min_sound):
try:
# Create a DataFrame for user input
user_input_df = pd.DataFrame([[user_mean_sound, user_max_sound, user_min_sound]],
columns=['mean_sound', 'max_sound', 'min_sound'])
# Standardiseze user input so it compares to the rest of my data
user_input_scaled = scaler.transform(user_input_df)
# Predict mood score
predicted_mood = model.predict(user_input_scaled)[0][0]
print(f"Predicted Mood Score: {predicted_mood}")
# Send our mood flying back out of my function
return predicted_mood
except ValueError:
# It's not me, it's you â•âˆ©â•®(-_-)â•âˆ©â•®
print("Invalid input. Please enter numeric values.")
# Scatterplot of the OG data
scatter_2d_plot(X, Y)
# Ask user for input and make prediction
print("Enter some predicted values for the following")
user_mean_sound = float(input("Average (mean) Sound Level: "))
user_max_sound = float(input("Maximum Sound Level: "))
user_min_sound = float(input("Minimum Sound Level: "))
# Create a point of reference to see if my 'what-if' Qs actually make a positive impact.
# Otherwise, the program will conclude that it's better to leave the sound levels ualtered.
untampered_prediction = make_prediction(user_mean_sound, user_max_sound, user_min_sound)
#---------------------------------------------------
"What-if Questions"
print("\nWhat-if Q1:")
print("What if I double the mean noise level but keep the max and min the same?")
# Multiplies mean sound input by 2
what_if_1 = make_prediction(user_mean_sound*2, user_max_sound, user_min_sound)
print("\nWhat-if Q2:")
print("What if I double the min noise level but keep the mean and min the same?")
# Shockingly, this one multiplies MAXIMUM sound input by 2
what_if_2 = make_prediction(user_mean_sound, user_max_sound*2, user_min_sound)
print("\nWhat-if Q3:")
print("What if I double the min noise level but keep the max and mean the same?")
# In a strange turn of events, this line multiplies minimum sound input by 2!
what_if_3 = make_prediction(user_mean_sound, user_max_sound, user_min_sound*2)
if what_if_1 > untampered_prediction and what_if_1 > what_if_2 and what_if_1 > what_if_3:
print("\nBased on my questions, it is clear that mean sound levels have the greatest impact on mood!")
print("Therefore what-if Q1 gives the best result")
elif what_if_2 > untampered_prediction and what_if_2 > what_if_1 and what_if_2 > what_if_3:
print("\nBased on my questions, it is clear that the maximum sound level reached has the greatest impact on mood!")
print("Therefore what-if Q2 gives the best result")
elif what_if_3 > untampered_prediction and what_if_3 > what_if_1 and what_if_3 > what_if_2:
print("\nBased on my questions, it is clear that the minimum sound level reached has the greatest impact on mood!")
print("Therefore what-if Q3 gives the best result")
elif untampered_prediction > what_if_1 and untampered_prediction > what_if_2 and untampered_prediction > what_if_3:
print("\nBased on my questions, it seems that none of my what-if questions have a postive impact on a person's mood score")
print("Therefore the original inputted mean, max, and min sound levels provide the best result")
This is a good stepping stone towards plotting a trend line with the neural interface, and it gives the following output:
.
Here is my horrible attempt at integrating the neural interface into my plotting (note that I have run this through ChatGPT an unfathomable amount of times, so it might not make since):
from silence_tensorflow import silence_tensorflow
silence_tensorflow()
# Tensorflow has a bunch of functions available for moddeling and analysis
# Specifically, I'm using the layers module to do the hard work for me in constructing a nueral network.
import tensorflow as tf
from tensorflow.keras import layers
# Pandas is a little more flexible and has more options than the previously used 'CSV' module
# In my case, it's essential for constructing dataframes from my uh... data.
import pandas as pd
# train_test_split shuffles the data
# it then splits the aforementioned data into two samples, one for training the model, another for testing it's accuracy
from sklearn.model_selection import train_test_split
# StandardScaler is pretty self explanatory, it standardises the sample data, so that a clear model can be created
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Plot that data, baby! (á—’á—Šá—•)
import matplotlib.pyplot as plt
import numpy as np
# Load my CSV values from the serial program
data = pd.read_csv('your_dataset.csv')
# X = Independent variables
X = data[['mean_sound', 'max_sound', 'min_sound']]
# Y = Target variable
Y = data['average_mood']
# Split that data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
# Standardise (ze?) the data
scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Yass queen! Model that neural network! Slay!
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
# Compile! (optimize the model and esatblish the loss function, which finds the difference between the predicted and actual data)
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model. Queue 'Eye of The Tiger'!
# Huh, 'queue' is kind of a weird word, right?
model.fit(X_train_scaled, Y_train, epochs=410, validation_split=0.2)
# Evaluate how our new, beefed-up, trained model compares to the OG data
mse = model.evaluate(X_test_scaled, Y_test)
print(f"Test Mean Squared Error: {mse}")
# Function to figure out what the Mean Squared Error says about my program
def mse_rating(mse):
if mse < 10:
return "Amazing model accuracy! What a smarty-pants :)"
elif mse < 20:
return "Great model accuracy"
elif mse < 30:
return "Fine model accuracy."
elif mse < 40:
return "Less than average model accuracy."
else:
return "Poor model accuracy :("
# Print the model accuracy rating
print(mse_rating(mse))
# Function to plot multiple graphs of each independent variable against mood score
def scatter_2d_plot(X, y, x_title, y_title):
plt.scatter(X[x_title], y, alpha=0.5)
plt.title(f'Mood Score vs {x_title}')
plt.xlabel(x_title)
plt.ylabel('Mood Score')
# Predict mood scores for the range of x values using the neural network model
x_range = np.linspace(X[x_title].min(), X[x_title].max(), 100).reshape(-1, 1)
# Ensure that the x_range is scaled using the same scaler that was fitted with
x_range_scaled = scaler.transform(x_range)
# Predict mood scores for the scaled x_range
y_pred = model.predict(x_range_scaled)
# Plot the trend line
plt.plot(x_range, y_pred, color='red')
plt.show()
# Have the model predict the mood score based on user input
def make_prediction(user_mean_sound, user_max_sound, user_min_sound):
try:
# Create a DataFrame for user input
user_input_df = pd.DataFrame([[user_mean_sound, user_max_sound, user_min_sound]],
columns=['mean_sound', 'max_sound', 'min_sound'])
# Standardize user input so it compares to the rest of my data
user_input_scaled = scaler.transform(user_input_df)
# Predict mood score
predicted_mood = model.predict(user_input_scaled)[0][0]
print(f"Predicted Mood Score: {predicted_mood}")
# Pass the entire DataFrame X_train to scatter_2d_plot
scatter_2d_plot(X_train, Y_train, 'mean_sound', 'Predicted Mood Score')
# Send our mood flying back out of my function
return predicted_mood
except ValueError:
# It's not me, it's you
print("Invalid input. Please enter numeric values.")
# Ask user for input and make prediction
print("Enter some predicted values for the following")
user_mean_sound = float(input("Average (mean) Sound Level: "))
user_max_sound = float(input("Maximum Sound Level: "))
user_min_sound = float(input("Minimum Sound Level: "))
# Create a point of reference to see if my 'what-if' Qs actually make a positive impact.
# Otherwise, the program will conclude that it's better to leave the sound levels ualtered.
untampered_prediction = make_prediction(user_mean_sound, user_max_sound, user_min_sound)
#---------------------------------------------------
# What-if Questions
print("\nWhat-if Q1:")
print("What if I double the mean noise level but keep the max and min the same?")
# Multiplies mean sound input by 2
what_if_1 = make_prediction(user_mean_sound*2, user_max_sound, user_min_sound)
print("\nWhat-if Q2:")
print("What if I double the min noise level but keep the mean and min the same?")
# Shockingly, this one multiplies MAXIMUM sound input by 2
what_if_2 = make_prediction(user_mean_sound, user_max_sound*2, user_min_sound)
print("\nWhat-if Q3:")
print("What if I double the min noise level but keep the max and mean the same?")
# In a strange turn of events, this line multiplies minimum sound input by 2!
what_if_3 = make_prediction(user_mean_sound, user_max_sound, user_min_sound*2)
# Validate predictions before comparing them
if all(pred is not None for pred in [untampered_prediction, what_if_1, what_if_2, what_if_3]):
if what_if_1 > untampered_prediction and what_if_1 > what_if_2 and what_if_1 > what_if_3:
print("\nBased on my questions, it is clear that mean sound levels have the greatest impact on mood!")
print("Therefore what-if Q1 gives the best result")
elif what_if_2 > untampered_prediction and what_if_2 > what_if_1 and what_if_2 > what_if_3:
print("\nBased on my questions, it is clear that the maximum sound level reached has the greatest impact on mood!")
print("Therefore what-if Q2 gives the best result")
elif what_if_3 > untampered_prediction and what_if_3 > what_if_1 and what_if_3 > what_if_2:
print("\nBased on my questions, it is clear that the minimum sound level reached has the greatest impact on mood!")
print("Therefore what-if Q3 gives the best result")
elif untampered_prediction > what_if_1 and untampered_prediction > what_if_2 and untampered_prediction > what_if_3:
print("\nBased on my questions, it seems that none of my what-if questions have a positive impact on a person's mood score")
print("Therefore the original inputted mean, max, and min sound levels provide the best result")
else:
print("At least one prediction failed. Please check your input values.")
The second block of code gives the following error:
C:\Users\Nitro\AppData\Roaming\Python\Python310\site-packages\sklearn\base.py:465: UserWarning: X does not have valid feature names, but StandardScaler was fitted with feature names warnings.warn( Invalid input. Please enter numeric values. At least one prediction failed. Please check your input values.
I know that this was a long question, I'm not really sure how much detail people usually give in these things. Literally any help is appreciated, and if you choose to help a fish out of water like me out of the goodness of your heart, I thank you in advance!
In your previous working code, adding the function scatter_2d_plot_withTFmodel()
defined below will do the required plotting. I tested it with some mock data:
The net is trained on 3 features, and therefore it needs to see a value for each of the 3 features any time you use predict()
. When plotting a trend line, you only care about one column at a time, so you need to come up with something for the other two columns, otherwise predict()
will complain about not having a valid value for each input.
In the code I used your x_range
for the column being plotted, and for the other 2 columns I used their average value to fill their column. That gives us the required 3 inputs for predict()
.
There is a prediction for each point on x_range
, under the assumption that the other features have an average value.
def scatter_2d_plot_withTFmodel(X, Y):
for column in X.columns:
plt.scatter(X[column], Y, alpha=0.5)
plt.title(f'Mood Score vs {column} (with TF model trendline)')
plt.xlabel(column)
plt.ylabel('Mood Score')
#We only want the trendline for "column" here,
# but the TF model needs all 3 inputs (mean_sound, max_sound, min_sound).
#
#To get round this, we need to provide values for the other 2 columns.
#
#1. Find out which columns aren't selected.
#2. Then get their average.
#3. Make a new dataframe that is populated with
# "x_range" for "column", and the average for the other two columns.
#4. Transform using the previously-fitted scaler, and predict.
#
#The prediction being made for "column" is therefore
# making the assumption that we have an average amount of the other columns.
#1. Find the unselected columns
other_columns = [col for col in X.columns if col != column]
#2. Get their average
other_columns_average = X[other_columns].mean().values
#3. Make a new dataframe - populate it with the values to use.
# x_range for this column
x_range = np.linspace(X[column].min(), X[column].max(), 100)
df_for_trendline = pd.DataFrame({
column: x_range,
other_columns[0]: [other_columns_average[0]] * len(x_range),
other_columns[1]: [other_columns_average[1]] * len(x_range),
})
# Reorder the columns so they match the order when "scaler" was fitted
df_for_trendline = df_for_trendline[['mean_sound', 'max_sound', 'min_sound']]
#4. Transform using previous scaler, and predict.
# Prediction is for x_range, assuming an average value
# for the other variables
df_for_trendline_scaled = scaler.transform(df_for_trendline)
y_pred = model.predict(df_for_trendline_scaled)
plt.plot(x_range, y_pred, color='red')
#plt.savefig("NeuralNetworkOutputChart.png")
plt.show()