I have a dataset of IT operations tickets with fields like Ticket No, Description, Category,SubCategory,Priority etc.
What I need to do is to use available data(except ticket no) to predict the ticket priority. Sample data shown below.
Number Priority Created_on Description Category Sub Category
719515 MEDIUM 05-01-2016 MedWay 3rd Lucene.... Server Change
720317 MEDIUM 07-01-2016 DI - Medway 13146409 Application Incident
720447 MEDIUM 08-01-2016 DI QLD Chermside.... Application Medway
Please guide me on this.
Answering without more is a bit tough, and this is more of a context questions than a code question. But here is the logic I would use to start to evaluate this problem Keep in mind it might involve writing a few separate scripts each performing part of the task.
Try breaking the problem up into smaller pieces.You cannot do an analysis without all the data so start by creating the data.
You have the category and sub category already make a list of all the unique factors in each list and create a set of weights for each based on your system and business needs. As you make subcategory weights, keep in mind how they will interact with categories (+/- as well as magnitude).
Write a script to read the description, count all the non-trivial words. Create some kind of classifications for words to help you build lists that will inform the model with categories and sub categories. Is the value an error message, or machine name, or some other code or type of problem you can extract using key words?
How are all the word groupings meaningful? How would the contribute to making a decision?
Think about the categories when you decide these things.
Then with all of the parts, decide on a model, build, test and refine. I know there is no code in this but the problem solving part of Data Science happens outside of code most of the time.
You need to come up with the code yourself. If you get stuck post an edit and we can help.