How does Ludwig encode images

I’m looking to understand how Ludwig encodes images. Does it run the images through a pretrained model without running a loss or does it run a loss? If so, what type of loss is ran for a large feature set?

Solution

All the documentation regarding image pre-processing and encoding can be found here.

In summary:

Currently there are two encoders supported for images: Convolutional Stack Encoder and ResNet encoder which can be set by setting encoder parameter to stacked_cnn or resnet in the input feature dictionary in the model definition (stacked_cnn is the default one).

Each of the above encoders have configurable hyper-parameters. It does not run them through a pre-trained model as that would defeat the purpose - you are training your own model with your own data.