Search code examples
pythonlangchainvector-databasedeeplake

Deeplake output of all stored files


With the docs = (db.similarity_search(query='some query here')) method to output single or multiple documents of the deeplake vectorstore. Is there a method to output all documents? Because my documents are structured like this:

page_content='256 128 256zM208 160c-8,836 0-16-...
384C234.5 384 256 362.5 256 336C256 309.5 234.5 288 208' 
metadata={'source':'chatbot/app/solid.min.js','file_name':'solid.min.js'}

And I would genre all documents whose metadata.file_name corresponds to a particular file. Unfortunately I can't find any recordings for this and that's why I'm asking here for experience.


Solution

  • you can query and apply filter on your metadata

           def query_datalake(db, query, subject):
    
            filter={"metadata": {"source": f"output\\{subject}.txt"}}  
            #Distance function L2 for Euclidean, L1 for Nuclear, Max l-infinity distance, cos for cosine similarity, dot for dot product
            docs = db.similarity_search(query, filter=filter, distance_metric="cos", k=10)
    
            return docs