I built a Pyspark Multinomial Logistic Regression model and integrated it with a Django web app so that I make predictions on the querysets. On my model, I saved it using the recommended
temp_path = pjoin("/home/maffsojah/Projects/HIT_400/capstone_project/web/tbank/spark-warehouse")
reg_path = temp_path + '/reg'
reg.save(reg_path)
model2 = LogisticRegression.load(reg_path)
model2.getMaxIter()
model_path = temp_path + '/reg_model'
regModel.save(model_path)
model2 = LogisticRegressionModel.load(model_path)
When I test inside my model, eveything will be working fine and the accuracy is 92% but when I save and load my model inside my django app, the accuracy becomes very low and approximately 22%.
How do I save and load my model while maintaining the same accuracy levels and parameters?
When you do Logistic Regression, you will end up with some weights that don't change when you save it. Also in terms accuracy, not sure if you understand the concept of Machine Learning but the input is data. I think you probably used a different data as input for your django app and then of course it can happen that your accuracy is low. There is no way to maintain the same accuracy levels except if you use the same dataset that you trained your model on but that is called overfitting.