Search code examples
pythonrtext-classification

Text analysis with a mix of text & categorical columns in R


I have a dataset of IT operations tickets with fields like Ticket No, Description, Category,SubCategory,Priority etc.

What I need to do is to use available data(except ticket no) to predict the ticket priority. Sample data shown below.

Number  Priority Created_on Description               Category     Sub Category
719515  MEDIUM  05-01-2016  MedWay 3rd Lucene.... Server       Change
720317  MEDIUM  07-01-2016  DI - Medway 13146409  Application  Incident
720447  MEDIUM  08-01-2016  DI QLD Chermside....  Application  Medway

Please guide me on this.


Solution

  • Answering without more is a bit tough, and this is more of a context questions than a code question. But here is the logic I would use to start to evaluate this problem Keep in mind it might involve writing a few separate scripts each performing part of the task.

    Try breaking the problem up into smaller pieces.You cannot do an analysis without all the data so start by creating the data.

    You have the category and sub category already make a list of all the unique factors in each list and create a set of weights for each based on your system and business needs. As you make subcategory weights, keep in mind how they will interact with categories (+/- as well as magnitude).

    Write a script to read the description, count all the non-trivial words. Create some kind of classifications for words to help you build lists that will inform the model with categories and sub categories. Is the value an error message, or machine name, or some other code or type of problem you can extract using key words?

    How are all the word groupings meaningful? How would the contribute to making a decision?

    Think about the categories when you decide these things.

    Then with all of the parts, decide on a model, build, test and refine. I know there is no code in this but the problem solving part of Data Science happens outside of code most of the time.

    You need to come up with the code yourself. If you get stuck post an edit and we can help.