I'm trying to reproduce the code from documentation: https://docs.llamaindex.ai/en/stable/examples/customization/llms/AzureOpenAI.html and receive the following error after index = VectorStoreIndex.from_documents(documents)
:
raise ValueError(f"Unknown document type: {type(document)}")
ValueError: Unknown document type: <class 'llama_index.legacy.schema.Document'>
Due to the fact that all these generative ai libraries are being constantly updated, I have to switch the import of SimpleDirectoryReader
and make it like from llama_index.legacy.readers.file.base import SimpleDirectoryReader
All the rest is actually the same with tutorial (using llama_index==0.10.18
and python of version 3.9.16
). I have spent already several hours on that and actually don't have ideas how should I proceed. So if somebody can assist with that - it would be super helpful :)
Many thanks in advance.
The error occurs because of the type of document you are passing for VectorStoreIndex.from_documents()
.
When you import SimpleDirectoryReader
from legacy modules, the type of document is llama_index.legacy.schema.Document
.
You are passing that to VectorStoreIndex
, which is imported from core modules: from llama_index.core import VectorStoreIndex
.
The document you referred to is correct for core modules, and you can import SimpleDirectoryReader
as from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
, and everything will work fine.
If you wish to use legacy modules, then use the code below.
from llama_index.legacy.llms.azure_openai import AzureOpenAI
from llama_index.legacy.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.legacy import SimpleDirectoryReader, VectorStoreIndex, ServiceContext
import logging
import sys
logging.basicConfig(
stream=sys.stdout, level=logging.INFO
) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
api_key = "3c9xxxyyyyzzzzzssssssdb9"
azure_endpoint = "https://<resource_name>.openai.azure.com/"
api_version = "2023-07-01-preview"
llm = AzureOpenAI(
model="gpt-4",
deployment_name="gpt4",
api_key=api_key,
azure_endpoint=azure_endpoint,
api_version=api_version,
)
# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
model="text-embedding-ada-002",
deployment_name="embeding1",
api_key=api_key,
azure_endpoint=azure_endpoint,
api_version=api_version,
)
documents = SimpleDirectoryReader(input_files=["./data/s1.txt"]).load_data()
type(documents[0])
service_context = ServiceContext.from_defaults(
llm=llm, embed_model=embed_model
)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
Output:
query = "What is the model name and who updated it last?"
query_engine = index.as_query_engine()
answer = query_engine.query(query)
print("query was:", query)
print("answer was:", answer)
Here, when using legacy modules, all tools and models should be imported from the same legacy modules, and an additional service context is used for the vector store index.