docker jupyter-notebook large-language-model llama-index

Where is llama_index ServiceContext cache folder?

I use llama_index in Jupyter Notebooks running in Docker container. For data persistence I need to mount the cache folder from the host to Docker container. Basically, my question is what is the name of "cache" folder that ServiceContext from llama_index uses and how to locate it.

Consider the following code example in Python:

from llama_index import VectorStoreIndex
from llama_index import ServiceContext
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
service_context = ServiceContext.from_defaults(
    llm=llm, embed_model="local:BAAI/bge-small-en-v1.5"
)

This code gives the following output:

    config.json: 100%
    743/743 [00:00<00:00, 32.2kB/s]
    model.safetensors: 100%
    133M/133M [00:32<00:00, 3.95MB/s]
    tokenizer_config.json: 100%
    366/366 [00:00<00:00, 31.0kB/s]
    vocab.txt: 100%
    232k/232k [00:00<00:00, 1.44MB/s]
    tokenizer.json: 100%
    711k/711k [00:00<00:00, 2.31MB/s]
    special_tokens_map.json: 100%
    125/125 [00:00<00:00, 12.9kB/s]

So, ServiceContext downloaded successfully the above files. But where are these files saved on my side? I can't find it. After Jupyter Notebook restarts all this lost and I have to download it once again.

Do they use HuggingFace under the hood? I checked default .cache folder which HuggingFace uses, there is no "local:BAAI/bge-small-en-v1.5" model artifacts.

Linux search find -iname "model.safetensors" gave nothing.

Solution

On my Mac it's storing under ~/Library/caches/llama_index/models. It would be nice to use HF_HUB.