I know documentdb isn't officialy or community supported yet by langchain but am trying to use their MongoDb atlas one kinda like this question but my problem is am getting this error :
pymongo.errors.ServerSelectionTimeoutError: docdb-restofit.cluster-cpm2mmw0q9qb.us-east-1.docdb.amazonaws.com:27017: timed out (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)
I am fairly new to aws and documentdb and cant seem to find a solution will really appreciate the help.
Here is what I have done till now to accomplish this, I am new to aws so a little confused.
Documentdb, my steps:
Now my code for langchain
from dotenv import load_dotenv
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma, MongoDBAtlasVectorSearch
from pymongo import MongoClient
load_dotenv()
text_splitter = CharacterTextSplitter(
separator="\n",
chunk_size=500,
)
emb = OpenAIEmbeddings()
loader = PyPDFLoader("./101.pdf")
pages = loader.load_and_split()
docs = loader.load_and_split(
text_splitter=text_splitter,
)
client = MongoClient()
DB_NAME = "langchain_db"
COLLECTION_NAME = "test"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "index_name"
MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]
vector_search = MongoDBAtlasVectorSearch.from_documents(
documents=docs,
embedding=OpenAIEmbeddings(disallowed_special=()),
collection=MONGODB_COLLECTION,
index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
)
I solved it and it was due to VPC policy of amazon aws this was helpful