I am working on converting a dataset into Activeloop Hub format. The dataset I am working with has NaN
values however I am not sure how to handle these values with the Hub dataset format.
The NaN
values are appearing in the labels of the dataset.
I know that NaN
value represents the absence of that value in the database. Also, from some reading, I know that sklearn implemented algorithms can’t perform on datasets that have such values. I was thinking of erasing the rows that have the NaN
values however I don't want to lose any information in the dataset.
Is there a best practice way to input NaN
values in Activeloop Hub format?
I am using Hub version 2.3.1.
It sounds like there are no labels for the samples. If so, then upload an empty sample for those labels. Please note that appending an empty sample is not the same as skipping a sample.
If the NaN
values are representing images, videos, etc that do not have labels, they should be uploaded as empty samples like this: ds.labels.append(np.zeros((0,)))
.