I am trying to get the keywords from a text file containing a text, and I'm stemming the text first. The code below works, but for some reason it generates the letter 'u' in front of the keyword list. E.g. this is what I get:
[(u'keyword1', 5), (u'keyword2', 4)]
And I'm not sure where the 'u' comes from. Here is the code (after importing the packages):
stemmer = SnowballStemmer("english")
rake_object = rake.Rake("SmartStoplist.txt", 5, 3, 4)
s = open("test.txt", "r").read()
s = re.sub('[^a-zA-Z0-9-_*.]', ' ', s) # Remove special characters that might cause problems with stemming
words = s.split()
stemmed = [stemmer.stem(word) for word in words]
stemmed = ' '.join(stemmed)
keywords = rake_object.run(stemmed) # Perform RAKE on stemmed text
print(keywords)
It means that it is Unicode string, stemmer returns this type of strings. It's been syntax since 2.0, in Pythons 2.x. To get more information, read documentation. Don't worry about it.