My current data frame looks like:
salary job title Raiting Company_Name Location Seniority Excel_needed
0 100 SE 5 apple sf vp 0
1 120 DS 4 Samsung la Jr 1
2 230 QA 5 google sd Sr 1
Now after applying Onehotencoding from sklearn on the multiple categories I've gotten a satisfactory model score and would like to predict the results based on their string values eg: model.predict('SE','5','apple','ca','vp','1')
rather than trying in input in 1000's of 0's and 1's based on the one-hot encoded data frame. How would I go on about this?
You need to save all the processing and write a function to use it.
Here is a basic example:
title_encoder = LabelEncoder()
title_encoder.fit(train['job title'])
def predict(model, data, job_title_column, encoder):
data[job_title_column] = encoder.transform(data[job_title_column])
prediction = model.predict(data)
return prediction
predictions = predict(model, data, 'job title', title_encoder)
You could also try using Pipeline: https://scikit-learn.org/stable/modules/compose.html