I am trying to use pandas-dedupe, but after labelling data I run into permission issues I cannot solve. Minimum working example:
import pandas_dedupe
import seaborn as sns
if __name__ == "__main__":
iris = sns.load_dataset('iris')
result = pandas_dedupe.dedupe_dataframe(iris, ["sepal_width", "sepal_length", "species"])
After labelling some data, the files dedupe_dataframe_learned_settings
and dedupe_dataframe_training.json
get created.
But during the deduplication process I run into errors like
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\THOMAS~1\\AppData\\Local\\Temp\\tmp_vrp9vbr'
I tried setting n_cores=1
in the dedupe_dataframe
method, but it didn't help me. What can I do?
I had the same problem I solved it by disabling multiprocessing. You can disable multiprocessing by setting n_cores=0
as shown below:
pandas_dedupe.dedupe_dataframe(df, ['first_name', 'last_name'], n_cores=0)
This should resolve the error.