Search code examples
pythontensorflowtf.data.dataset

UP Date: How remove sample from tf.data.dataset with missing or NaN values?


Up Date I add this command to clear sample with missing values who lead my neural network to fail:

ds = ds.ignore_errors()

I use this function to remove all samples with NaN or missing values... but it don't work well

def filter_nan_sample(ds):
    # find NaN
    ynan = tf.math.is_nan(ds)
    y = tf.reduce_sum(tf.cast(ynan, tf.float32))
    if y >0:
        return False
    return True

ds = ds.filter(filter_nan_sample)
# catch all sample with "defect" like missing values
ds = ds.ignore_errors()

I get this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_2_device_/job:localhost/replica:0/task:0/device:CPU:0}} Field 4 is required but missing in record! [Op:IteratorGetNext] name: 

field 4 match with a variable not always availlable in the record. It is not inpossible in my case to deal this problem before turning data to dataset.


Solution

  • Not sure how the "missing field" is related with the nan filters, but the nan filter itself can use some upgrade because using plain ifs on a tensor can cause trouble sometimes.

    def filter_nan_sample(ds):
        # find NaN
        ynan = tf.math.is_nan(ds)
        return tf.math.logical_not(tf.math.reduce_any(ynan))  # True if no NaNs