Following this example https://github.com/OpenRefine/OpenRefine/wiki/Recipes#removeextract-words-contained-in-a-file
I'm trying to remove stopwords listed in a file using open refine
Example: you want to remove from a text all stopwords contained in a file on your desktop. In this case, use Jython.
with open(r"C:\Users\ettor\Desktop\stopwords.txt",'r') as f :
stopwords = [name.rstrip() for name in f]
return " ".join([x for x in value.split(' ') if x not in stopwords])
Unfortunately got Internal error
Yes, this script works as you can see in this screencast.
I changed it a bit to ignore the letter case.
with open(r"~\Desktop\stopwords.txt",'r') as f :
stopwords = [name.rstrip().lower() for name in f]
return " ".join([x for x in value.split(' ') if x.lower() not in stopwords])
In an Open Refine's Python script, "internal error" often means a syntax error, such as a forgotten parenthesis or bad indentation.