Search code examples
machine-learningapache-sparksvmmahoutlogistic-regression

Can I use logistic regression algorithm to predict an ETA for a given task based on historical data?


Can I use logistic regression algorithm to predict an ETA for a given task based on historical data? I have some tasks which takes variable amount of time based on few factors like task type, weather, season, time of request etc.

Today we capture the time taken for all the tasks based on task types in a mysql store. Now we want to add a feature where based on factors and task type, we want to predict an ETA for the task and show it to customer.

We are planning to use Spark and use Logistic Regression and SVM algorithm. We are too new to this domain and need your guidance in terms of validating the approach and additional pointers.


Solution

  • You can achieve this with just a linear regression model because you're trying to predict a continuous outcome (ETA).

    You would just train a regression model where you're predicting ETA from your input features (task type, weather, season etc). So what this model learns is how long would the task takes to complete given a certain set of inputs, the predicted outcome is what you would then show to customers

    Take a look at this: http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression

    Logistic regression/SVM is used for classifying discrete outcomes (i.e. categories/groups).

    So another approach might be to stratify the ETA scores in your mysql database into something like short/medium/long time to complete, and then use those 3 categories as your labels instead of the actual numerical value. Then you can use logistic regression to train a model that classifies into those 3 categories, based on your listed input features. This would work, but you lose some resolution due to condensing your ETA data into only 3 groups but that's a design decision you'd have to make.