Search code examples
tensorflowtensorflow-estimator

Confusion about how bucketized feature columns work


I had some confusion about how bucketized feature columns represent input to the model. According to the blog post on feature columns, when we bucketize a feature like year this puts each value in buckets based on the defined boundaries, and creates a binary vector, turning on each bucket based on the input value, but the example in the documentation shows the output as a single integer. I'm confused as to how the input is to the model when using a bucketized column. Can anyone clarify this for me please?


Solution

  • From the dimensions of the first hidden layer of the estimator, it seems like for each feature column that is a tf.feature_column.bucketized_column, a one hot encoded vector is created based on the boundaries.