Search code examples
vectorembeddingdeeplake

Deeplake get the list of vectors within the vectorstore?


I have been using deeplake to create a vector database, that works just fine, my issue is I'm unsure how to retrive the vectors from the database, that is the physical numbers that make up the embedding vector (Such as [0.2143, 1.2332, 2.223, -0.23123, ...., 1.23242] or whatever they are).

I found this code found here: https://docs.activeloop.ai/tutorials/vector-store/deep-lake-vector-store-in-langchain

# LangChain Vector Store
db = DeepLake(dataset_path=dataset_path)

# Deep Lake Vector Store object
ds = db.vectorstore

# Deep Lake Dataset object
ds = db.vectorstore.dataset

Which seems to be sort of what I'm looking for, but it does not work as advertised. it returns an error saying vectorstore does not exist.

I've tried several forms of below trying to make it work.

from langchain.vectorstores import DeepLake
import tqdm
import os


# Load the vector store
vector_store_path = '/path_to_deeplakestorage/DeeplakeStorage'
vector_store = DeepLake(dataset_path=vector_store_path, read_only=True, verbose=False)

all_vecs = vector_store.vectorstore.dataset. # This does not work

Again, I'm just trying to return all the id's and vectors found within this database. The result should return the list of vectors or ID's and vectors together.

Any help is much appreciated. Thank you.


Solution

  • Do you want to retrieve all of the embeddings that are stored in the dataset as well as ids? If so, you could just do the following:

    from langchain.vectorstores import DeepLake
    import tqdm
    import os
    
    
    # Load the vector store
    vector_store_path = '/path_to_deeplakestorage/DeeplakeStorage'
    vector_store = DeepLake(dataset_path=vector_store_path, read_only=True, verbose=False)
    
    embeddings = vector_store.vectorstore.embedding.data()['value']
    ids = vector_store.vectorstore.id.data()['value']