I am developing a application in python which gives job recommendation based on the resume uploaded. I am trying to tokenize resume before processing further. I want to tokenize group of words. For example Data Science is a keyword when i tokenize i will get data and science separately. How to overcome this situation. Is there any library which does these extraction in python?
Looks like you are looking to generate n-grams (bi-grams in particular). If that's the case, the following is one way to achieve this:
from nltk import ngrams
resume = '... working in the data science field for years ...'
n = 2
bigrams = ngrams(resume.split(), n)
for grams in bigrams:
print grams