Search code examples
azure-databrickslangchainmlflow

Wrapping LLM deployed in azure Databricks


I have deployed a LLM model on azure databricks. I can access it via the databricks API:

Https://adb-17272728282828282.1.azuredatabricks.net/serving-endpoints/my-code-llama/invocations

I am aware that langchain supports Databricks directly but is it possible to say wrap the databricks cluster in mlflow/openai wrapper or something so that I can use it in langchain like this:

Llm = ChatOpenAI(
   openai_api_base="http://my-url/API"
   openai_api_key="7282"
   model_name= "my-code-llama"
   max_tokens =1000,
   streaming=True,
   callbacks= [StreamingStdOutCallbackHandler()]
)

Trying to do so as there are alot of limitations if I just use the langchain Databricks wrapper directly. I am quite new so some support would be really great!


Solution

  • Below are the requirements for wrapping a model serving endpoint in Databricks:

    • A registered LLM deployed to a Databricks model serving endpoint.
    • CAN QUERY permission to the endpoint.

    Install the latest langchain on the cluster library and restart the cluster.

    enter image description here

    Use the code below.

    from langchain.llms import Databricks
    
    llm = Databricks(endpoint_name="databricks-mpt-7b-instruct")
    llm("How are you?")
    

    Output:

    enter image description here

    If you want to use this endpoint directly in langchain, try the code below.

    Install langchain-openai as shown in the above steps.

    Note: Make sure to use a model of the correct task type. Below, I used the model (databricks-llama-2-70b-chat) for the task chat.

    from langchain_openai import ChatOpenAI, OpenAI
    from langchain.callbacks import StreamingStdOutCallbackHandler
    
    chat_llm = ChatOpenAI(
       openai_api_base="https://<xxxxxx>.azuredatabricks.net/serving-endpoints",
       openai_api_key="dapi-xxxxx",
       model_name= "databricks-llama-2-70b-chat",
       max_tokens =1000,
       streaming=True,
       callbacks= [StreamingStdOutCallbackHandler()]
    )
    
    chat_llm.invoke("What is mlflow?")
    

    Output:

    enter image description here

    If you want to serve this in a custom endpoint, you need to use langserve with deploying it in the cloud.

    Refer to this documentation for more information.