Search code examples
pytorchneural-networktensor

Which dtype for one-hot encoded features when converting them into pytorch tensors?


The title explains most of my problem. I have a dataset with both categorical and quantitative features. My question is if it's best to assign the type torch.float32 to the one-hot encoded features, which means that I can create one tensor for both the quantitative and the categorical (OH encoded) features, or if I should use torch.bool for the one-hot features, since they are all either 1 or 0.

If I were to use torch.bool it would complicate the creation of the model since I would need to create 2 "pathways". I'm new to this so I don't know if using torch.float32 would cause any issues or not.


Solution

  • The tensor dtype depends on what you intend to do with it.

    "number crunching" layers like nn.Linear, nn.Conv2d, etc expect a torch.float32 input, or torch.float16 for half precision training.

    "number lookup" layers like nn.Embedding expect the input to be torch.int or torch.long.

    So long as the dtype is compatible with the layer that will be processing it, you're good.