I have a data frame that looks like this.
Month Day Deadline_Changes Test
3 19 2 English
5 3 8 Math
3 8 34 Science
10 2 17 Science
5 9 21 Social
4 12 3 Math
8 29 1 Music
12 31 9 English
And a second dataframe that looks like this.
Month Day Test
5 30 Math
9 2 Social
12 9 Science
11 30 Music
8 24 Music
2 2 English
6 12 Music
4 9 English
My desired output is
Month Day Test Predicted_Deadline_Changes
5 30 Math 4
9 2 Social 23
12 9 Science 6
11 30 Music 18
8 24 Music 4
2 2 English 2
6 12 Music 1
4 9 English 10
Basically, I want to use my first data frame as my training data to predicted what the deadlines changes are for my second data frame.
I want my desired output to be the second data frame with an additional variable called predicted_deadline_change. I need the predicted_deadline_change variable to be based on the training data.
Using python, what would be the best approach/method to do this?
This is a simple regression model for predicting deadline changes.
train = pd.read_clipboard()
predict = pd.read_clipboard()
y = train['Deadline_Changes']
x = train.drop('Deadline_Changes',1)
le = preprocessing.LabelEncoder()
x['Test'] = le.fit_transform(x['Test'])
model = LinearRegression()
model.fit(x,y)
# remove .round() if you want exact values
predict['Predicted_Deadline_Changes'] = model.predict(x).round()
print(predict)
Results:
Month Day Test Predicted_Deadline_Changes
0 5 30 Math 3.0
1 9 2 Social 10.0
2 12 9 Science 19.0
3 11 30 Music 20.0
4 8 24 Music 23.0
5 2 2 English 9.0
6 6 12 Music 10.0
7 4 9 English 0.0
There are a lot of different modeling techniques for predicting values, all having different advantages and disadvantages.
This would probably be your most basic model that assumes a linear relationship between your independent and dependent variables.