here is the dataset i'm working. This dataset is the survey of VR application areas from 2019 to 2021, where N represents the "number of applications in each area" and % represents the "percentage of the total sample". I'm having problems while finding the accuracy of the train and test models.
At first libraries imported and read the csv file:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = pd.read_csv('VR_application.csv')
now split the data into train and test sets:
X_train, X_test, y_train, y_test = train_test_split(df[['year']], df[['N', '%']], test_size=0.2, random_state=42)
Creating and training the linear regression model:
model = LinearRegression()
model.fit(X_train, y_train)
Predicting the values for training and testing sets
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
Calculating the accuracy (R-squared score) for train and test sets
train_accuracy = r2_score(y_train, y_train_pred)
test_accuracy = r2_score(y_test, y_test_pred)
checking the accuracy on train and test sets:
print(f"Train Accuracy: {train_accuracy}")
print(f"Test Accuracy: {test_accuracy}")
After doing this I get the accuracy:
Train Accuracy: 0.004421041085529986
Test Accuracy: -0.09666987166765573
Can you check my code and identify the problem I'm missing?
X_train, X_test, y_train, y_test = train_test_split(df['N', '%'], df['year'], test_size=0.2, random_state=42)