Search code examples
pandasmachine-learningencodingscikit-learnsklearn-pandas

how to predict values in sklearn with encoded features?


My current data frame looks like:

 salary   job title    Raiting   Company_Name  Location    Seniority   Excel_needed
0  100         SE         5          apple        sf          vp             0
1  120         DS         4         Samsung       la          Jr             1
2  230         QA         5         google        sd          Sr             1

Now after applying Onehotencoding from sklearn on the multiple categories I've gotten a satisfactory model score and would like to predict the results based on their string values eg: model.predict('SE','5','apple','ca','vp','1') rather than trying in input in 1000's of 0's and 1's based on the one-hot encoded data frame. How would I go on about this?


Solution

  • You need to save all the processing and write a function to use it.

    Here is a basic example:

    title_encoder = LabelEncoder()
    title_encoder.fit(train['job title'])
    
    
    def predict(model, data, job_title_column, encoder):
        data[job_title_column] = encoder.transform(data[job_title_column])
        prediction = model.predict(data)
        return prediction
    
    predictions = predict(model, data, 'job title', title_encoder)
    

    You could also try using Pipeline: https://scikit-learn.org/stable/modules/compose.html