Search code examples
pythonpandaswindowsduplicatespermissionerror

PermissionError: [WinError 32] using pandas-dedupe


I am trying to use pandas-dedupe, but after labelling data I run into permission issues I cannot solve. Minimum working example:

import pandas_dedupe
import seaborn as sns

if __name__ == "__main__":
    iris = sns.load_dataset('iris')
    result = pandas_dedupe.dedupe_dataframe(iris, ["sepal_width", "sepal_length", "species"])

After labelling some data, the files dedupe_dataframe_learned_settings and dedupe_dataframe_training.json get created. But during the deduplication process I run into errors like

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\THOMAS~1\\AppData\\Local\\Temp\\tmp_vrp9vbr'

I tried setting n_cores=1 in the dedupe_dataframe method, but it didn't help me. What can I do?


Solution

  • I had the same problem I solved it by disabling multiprocessing. You can disable multiprocessing by setting n_cores=0 as shown below:

    pandas_dedupe.dedupe_dataframe(df, ['first_name', 'last_name'], n_cores=0)
    

    This should resolve the error.