I am trying to leverage the preview feature of Azure OpenAI Service for bringing your own data. I have a large blob storage containing thousands of hundreds of documents (.pdf,.docx,.xls, etc.) and I would like to be able to query them with some filtration behind the scenes, e.g., "Provide me with a summary for price docs" returns summary filtered by custom field which I've applied through code. Trying to follow RAG pattern, but here are some issues:
What is the best approach here and are there any foreseeable obstacles to connect this cognitive search to the OpenAI Service at a later point?
Pulling data from Cognitive Search and pushing data to Cognitive Search will both result to the same thing: an index, in a JSON format. The only difference is how you populate your index:
So there is no "better option" from my point of view for what you are trying to achieve.
You will also struggle to generate summaries if your 'source documents' are split (/chunked) into several search items (aka documents) as you might not retreive all the content to generate the summary