Search code examples
pythonsnowflake-cloud-data-platformstreamlit

Passing list of objects and dictionary to snowflake from streamlit


I'm developing a Streamlit app to create an AI chatbot using the "complete" function in snowflake. When a user starts the app, I need to pass a sequence of initial messages into the function. Here's an example:

messages = [
{'system': 'act like a database professional'},
{'user': 'context: I have a database containing sales data. I need you to answer user questions based on the input data between <data> </data> tags'},
{'user': '<data>
                actual data ...
                </data>'},
{'user': 'provide the answer in markdown format'}
]

These messages are hard-coded in the app.

User input and assistant messages will be appended to the messages list like this:

messages.append({'role': 'user', 'content': question})

The option parameter is set as follows:

option = {
    'temperature': 0,
    'max_tokens': 128, 
    'guardrails': True
}

Here's how I call the complete function:

cmd = """
            select snowflake.cortex.complete(?, ?, ?) as response
          """
    
df_response = session.sql(cmd, params=[st.session_state.model_name, messages, option]).collect()

Currently, I'm consistently receiving this error message:

SnowparkSQLException: (1304): 252011: Python data type [dict] cannot be automatically mapped to Snowflake data type. Specify the snowflake data type explicitly.

I've completed this quickstart, but its solution summarizes all past conversations into one prompt. I'd prefer to pass it as an array of objects.

https://quickstarts.snowflake.com/guide/ask_questions_to_your_own_documents_with_snowflake_cortex_search/index.html?index=..%2F..index#0

I've attempted several solutions, including using Snowpark types such as StructType, StructField, StringType, ArrayType, MapType, and the UDF library, but without success.

Can anyone offer assistance?


Solution

  • As snowflake.cortex is not available in my region (can not import) and I prefer the 'passing direct SQL to Snowflake' solution so I can run the app apart from the snowflake infrastructure, I have made the following changes to my code:

    1. Create the 'Initial Messages List' of conversations in the Python app. If you want to keep track of the full conversation, the Snowflake complete function requires the input to follow the sequence: 'system', 'user', 'assistant', 'user', 'assistant', and so on. You cannot have two user elements in a row.If you need multiple user inputs, you can insert a dummy assistant response.
    messages = [
    {'role': 'system', 'content': 'act like a database professional'},
    {'role': 'user', 'content': """context: I have a database containing sales data. I need you to answer user questions based on the input data between <data> </data> tags.
    <data>
     actual data ...
    </data>"""},
    {'role': 'assistant', 'content': ''}, 
    {'role': 'user', 'content': 'question 2 ....'}, 
    ]
    
    1. Convert the list to String of Json and remove the special characters (single quotes)
    prompts_str = (
            json.dumps(messages, indent=3)
            .replace('\'', '\\\'')
            .replace('"', '\'')
        )
    
    OPTIONS = (
            json.dumps({
                        'temperature': 0,
                        'max_tokens': 128000, 
                        'guardrails' : True
                    }, indent=3)
            .replace('\'', '\\\'')
            .replace('"', '\'')
        )
    
    1. Create the SQL and send the request
    sql = f"""
                select snowflake.cortex.complete('llama3.1-70b', {prompts_str}, {OPTIONS}):"choices"[0]:"messages"::string as response
              """
    
    # read the answer 
    response = session.sql(sql).collect()[0].as_dict()["RESPONSE"].replace("'", "")