Search code examples
jythonuuidopenrefine

How to create UUID in Openrefine based on the MD5 hash of the values


I am trying to create a UUID based on the md5 hash of a cell value in OpenRefine (using Jython) but I am having troubles passing the value to the function.
I am able to create UUID using the expression:

import uuid;
return str(uuid.uuid4());

but I want to use the md5 hash of the cell's value, so I tried to follow the formula

uuid.uuid3(namespace, name)

However, I am unable to pass the value to the function. The attempt:

import uuid;
return str(uuid.uuid3(uuid.NAMESPACE_DNS, value));

receive the following error:

Error: Traceback (most recent call last): File "", line 3, in temp_448166737 File "/Applications/OpenRefine 3.2b.app/Contents/Resources/webapp/extensions/jython/module/MOD-INF/lib/jython-standalone-2.7.1.jar/Lib/uuid.py", line 528, in uuid3 UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position 1: ordinal not in range(128)

Without using the cell's value, the expression works quite well. The example

    import uuid;
    return str(uuid.uuid3(uuid.NAMESPACE_DNS, 'example'));

use the string "example" and compute the UUID c5e5f349-28ef-3f5a-98d6-0b32ee4d1743 for each cells. However, it is not the desired result.

Any ideas how to pass to Jython the value of the cell present in OpenRefine within an expression?


Solution

  • You just have to encode your unicode strings in value with .encode('utf-8'), as explained here:

    import uuid
    return str(uuid.uuid3(uuid.NAMESPACE_DNS, value.encode('utf-8')))