Search code examples
pythontensorflowtensorflow-transform

Correct usage of TensorFlow Transform apply_buckets


This is on TensorFlow 1.11.0. The documentation of tft.apply_buckets is not very descriptive. In specific, I read: "bucket_boundaries: The bucket boundaries represented as a rank 2 Tensor."

I assume this has to be bucket indices and bucket boundaries?

When I try with the toy example below:

import tensorflow as tf
import tensorflow_transform as tft
import numpy as np

tf.enable_eager_execution()

x = np.array([-1,9,19, 29, 39])
xt = tf.cast(
        tf.convert_to_tensor(x),
        tf.float32
        )

boundaries = tf.cast(
                tf.transpose(
                    tf.convert_to_tensor([[0, 1, 2, 3], [10, 20, 30, 40]])
                    ),
                tf.float32
                )

buckets = tft.apply_buckets(xt, boundaries)

I get:

InvalidArgumentError: Expected sorted boundaries [Op:BucketizeWithInputBoundaries] name: assign_buckets

Note that in this case x and bucket_boundaries arguments are:

tf.Tensor([-1.  9. 19. 29. 39.], shape=(5,), dtype=float32)
tf.Tensor(
[[ 0. 10.]
 [ 1. 20.]
 [ 2. 30.]
 [ 3. 40.]], shape=(4, 2), dtype=float32)

So, it seems like bucket_boundaries is not supposed to be indices and boundaries. Does anyone know how to properly use this method?


Solution

  • After some playing around, I found out that bucket_boundaries is supposed to be a 2 dimensional array where entries are bucket boundaries and the array is wrapped so it has two columns. See example below:

    import tensorflow as tf
    import tensorflow_transform as tft
    import numpy as np
    
    tf.enable_eager_execution()
    
    x = np.array([-1,9,19, 29, 39])
    xt = tf.cast(
            tf.convert_to_tensor(x),
            tf.float32
            )
    
    boundaries = tf.cast(
                    tf.transpose(
                        tf.convert_to_tensor([[0, 20, 40, 60], [10, 30, 50, 70]])
                        ),
                    tf.float32
                    )
    
    buckets = tft.apply_buckets(xt, boundaries)
    

    So, the expected inputs are:

    print (xt)
    print (buckets)
    print (boundaries)
    
    tf.Tensor([-1.  9. 19. 29. 39.], shape=(5,), dtype=float32)
    tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
    tf.Tensor(
    [[ 0. 10.]
     [20. 30.]
     [40. 50.]
     [60. 70.]], shape=(4, 2), dtype=float32)