Search code examples
exceltext-miningdata-analysisvba

Text Mining - What is the best way to mine descriptive excel sheet data


I have university placement data pulled from databases in excel sheet. I need to text mine the job description offered by companies, which is a descriptive field for all the rows and then come up with the analysis of profiles in demand. Here is a snapshot of the data enter image description here

Could anyone help me to kick start this activity?

Thanks Saurabh


Solution

  • I am not a data expert but I have some data mining experience. I would try following these steps for starters:

    1. Excel is not a good for such an analysis. Find some tool dedicated to data mining e.g. RStudio. R has many useful out-of-the-box algorithms for data mining.

    2. Cleanse the data e.g. all texts to lower case, remove stop words, remove punctuation, remove additional white spaces.

    3. Tokenize the data e.g. 1 word tokens - "finance", "bachelor"

    4. Decide on how you will assert if a certain profile is in demand or not? If by profile you mean that you need the information on the frequency of certain tokens appearing in the data more often then others e.g. "finance", "bachelor" etc. then simply create a frequency matrix. R allows you to create a visualisation of this - Word Clouds.

    This is to start you off :). I am sure there is much more to be suggested in this matter.