Search code examples
javaxmlxml-parsingjdom.doc

Resume parser in Java


I want to parse a resume to get different titles and content, which includes bullets, paragraphs, urls. I have the resume in .doc/.docx format. Research so far has resulted in

1.building an xml file from the .doc file and then
2. build an xml parser using JDOM.

Is there any other approach or a better way to do this? some algorithm that would help identify structures in resume?


Solution

  • look like you are in right direction. Simple approach is : Once you identify information and moved further, you just need to transverse based on +/- steps with calculated spaces, and identify results.

    I am sure you are using NLP methodology which can help you to get data with proximity and then you can remove noise based on your experience.

    or simple go and get some already build up. I recomend you RChilli CV Parsing or others like hireability or sovren and discuss your need. I am sure you get some information

    thanks -K