Lets say i got a pretrained model model.pt
how do i know what classes can this model predict?
i think it is saved inside the model but how do I extract it?
Im trying to understand what https://github.com/AndreyGuzhov/AudioCLIP does
it has a pretrained AudioCLIP-Full-Training.pt
how do I know the labels or classes inside this AudioCLIP-Full-Training.pt
As @lauthu already said, the first place to look would be the Notebook: https://github.com/AndreyGuzhov/AudioCLIP/blob/master/demo/AudioCLIP.ipynb.
The notebook mentions these labels
LABELS = ['cat', 'thunderstorm', 'coughing', 'alarm clock', 'car horn']
The notebooks shows examples of only 5 classes. However more are possible, see below.
Another place to look for the classes is in the paper for AudioCLIP. The paper mentions that AudioCLIP is trained on the AudioSet dataset which has 632 audio classes. See the entire ontology of labels here. So it could predict easily for these 632 classes that AudioCLIP is trained on.
In addition to these 632 classes, since AudioCLIP is based on CLIP architecture, it also has zero-shot inference capabilities as noted in the AudioCLIP paper:
"keeping CLIP's ability to generalize to unseen datasets in a zero-shot fashion".
What it means essentially is you could use any common English concept/word and AudioCLIP should be able to classify sounds even if it was not trained on them. This is possible because AudioCLIP is an extension of CLIP and CLIP model has "seen" a lot of natural English words in its dataset of ~400M (image, caption) pairs.