In the course of debugging issues, I've found it hard to decipher exactly which tasks are causing problems. I've used the 'dask_key_name' kwarg successfully in delayed tasks to assign a human-readable name to the key for those delayed tasks (based on the documentation here: https://docs.dask.org/en/latest/delayed-api.html). I've tried to do the following in the hopes that it would do the same for the read_parquet tasks, but it appears it still uses a hashed value to create the key (e.g., ('read-parquet-ed9e6c4c474e851e176e7eafb8753490', 5)).
item = 'custom_string'
self.all_pfs_dict['read'][item] = dd.read_parquet(item_to_read, index=False, gather_statistics=False, dask_key_name=item + '-read')
Am I doing something wrong or is there an alternative way to name dask dataframe tasks?
There is no way to rename dataframe tasks like this today.