How does huggingface/sentencetransformers figure out a model's input/output shapes?

Using the https://huggingface.co/sentence-transformers python package, I'm able to just specify a repo/model, and everything just works. However, when I try to consume a model with .NET/ONNX, I have to specify the input_ids max length, which for this model, https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2, the documentation says is 256, but sentence-transformers seems to return up to 512 tokens. And I have to manually specify the output size, which the documentation says is 384. And of course I have to know the tokenizer to use as well.

I did try to look at the model with https://netron.app/, but it just reports -1,-1 for all the shapes.

Is the python code making an API call to get all this info? I'm asking because I'd like to maybe try to replicate what it does in C#. Thanks!

Solution

I finally spelunked through the sentence-transformers code, and have some idea of how things are happening now.

Starting in SentenceTransformer.py, the library calls the huggingface api for the model metadata, for example, https://huggingface.co/api/models/sentence-transformers/all-MiniLM-L6-v2. It then downloads the sibling files referenced there, namely in our case files like sentence_bert_config.json where max_seq_length is found, which tells us the input_ids max length, and config.json which gives us hidden_size, or the length of the output.

Unfortunately, replicating everything done here in C# would be very difficult, as one would also need to convert the pytorch model to ONNX, implement tokenization, and usually do some post-processing pooling and normalization (which the massive transformers library implements). The transformers library is directed by a model's modules.json file