Search code examples
jythonopenrefine

Remove stopwords using open refine


Following this example https://github.com/OpenRefine/OpenRefine/wiki/Recipes#removeextract-words-contained-in-a-file

I'm trying to remove stopwords listed in a file using open refine

Example: you want to remove from a text all stopwords contained in a file on your desktop. In this case, use Jython.

with open(r"C:\Users\ettor\Desktop\stopwords.txt",'r') as f :
    stopwords = [name.rstrip() for name in f]

return " ".join([x for x in value.split(' ') if x not in stopwords])

Unfortunately got Internal error


Solution

  • Yes, this script works as you can see in this screencast.

    enter image description here

    I changed it a bit to ignore the letter case.

    with open(r"~\Desktop\stopwords.txt",'r') as f :
        stopwords = [name.rstrip().lower() for name in f]
    
    return " ".join([x for x in value.split(' ') if x.lower() not in stopwords])
    

    In an Open Refine's Python script, "internal error" often means a syntax error, such as a forgotten parenthesis or bad indentation.