I am working on a solution for providing a custom platform catering to manage and run LLM applications using RAG and LLM models using user provided document repository. While planning and designing a solution, I came across few frameworks (open-source) such as KFServing, Deep-Java-Library, MLFlow and few more that are recommended to use along with ML pipelines orchestration (Kubeflow) along with Data-pipelines. I wanted to understand principles on how to choose the framework that suits to run models with scalable performance, especially using LLMOps stack for a variety of use cases, such as ChatAgents, Content generation (Emails), Code generation etc. Any pointers on how to choose the framework for the design of a platform that is capable for all the various scenarios in Gen-AI applications development and hosting.
Here, are some bullets points :