I'm trying to understand if there is a way, and how to achieve it, to index binary data (mostly MS Office Documents and PDFs) that do not reside in Azure Blob Storage but on other non-azure data sources.
The closest example I found copies the files to an Azure blob container and then add a skillset to index these docs from there.
I would like to bypass the Azure blob container, and push the doc metadata as well as the binary content directly.
Any advise or example I can look at?
Thanks
You can define custom skillsets with both custom and built-in skills when you push data to the index. There is Document Extraction skill that does what you want. See:
https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-document-extraction