I’m looking to understand how Ludwig encodes images. Does it run the images through a pretrained model without running a loss or does it run a loss? If so, what type of loss is ran for a large feature set?
All the documentation regarding image pre-processing and encoding can be found here.
In summary:
Currently there are two encoders supported for images: Convolutional Stack Encoder and ResNet encoder which can be set by setting encoder parameter to stacked_cnn or resnet in the input feature dictionary in the model definition (stacked_cnn is the default one).
Each of the above encoders have configurable hyper-parameters. It does not run them through a pre-trained model as that would defeat the purpose - you are training your own model with your own data.