huggingface-transformers embedding large-language-model huggingface-tokenizers retrieval-augmented-generation

Embedding of LLM vs custom embeddings

I am new to topic of LLMs (been just 2-3 days) and I've encountered a potential issue in RAG Pipelines. Which assertion is wrong/right?

LLM models utilize the most fundamental units of processing as tokens. Tokens are created via tokenizers (will be specific to a model)
A token is passed into LLM sequentially (from the list of tokens at a time, which also determines the context window)
When "training", the "embeddings" are randomly initialized. After training , the embedding matrix is created such that there is an embedding for a particular token

Now in RAG, why is it that we are able to 'customize' our own embedding? I understand this helps with vectorsearch of already stored embeddings, but finally when you send all this to the model, does this "bypass" the model's embeddings and starts the inference process as it would? Also why don't RAG pipelines mention tokenizers often?

Went through multiple websites but process is abstracted everywhere There's a mention of "we create embeddings" and then done!

Solution

I found the answer....it was a silly little thing. The crux of the matter is , when you're extracting embeddings based on semantic search, the final result(no matter what comes) is ultimately going to be converted into text only, hence the name "augmented" generation - essentially a glorified mechanism of enhancing query context as far as this simple application is concerned

So essentially, the two processes (having some custom embedding in vector DB) vs embeddings in the model itself, are naturally separate and don't impact each other