Search code examples
python-3.xsqlalchemylangchainllama-index

Update field metadata using llama index and sqlalchemy


How do you set the metadata on a query engine? The data field is a JSON field with specific fields that I want the agent to use to improve the query.

metadata_dict = {
    "page_call_phone_clicks_logged_in_unique": "Number of people who logged into Facebook and clicked the Call Now button. Ex: 2"
}

What do I do with snapshots_table to set it on the query_engine?

    `
class SqlQueryEngine:
    def query_engine(self):
        engine = create_engine(pg_uri)
        metadata_obj = MetaData()

        # Convert the metadata dictionary to a JSON string
        metadata_json = json.dumps(metadata_dict)

        # create snapshot SQL table
        table_name = "snapshots"

        snapshots_table = Table(
            table_name,
            metadata_obj,
            Column("data", JSON, comment=metadata_json),
        )

        metadata_obj.reflect(engine)

        sql_database = SQLDatabase(engine)

        nodes = SQLTableNodeMapping(sql_database)

        schema = []
        for table_name in metadata_obj.tables.keys():
            schema.append(SQLTableSchema(table_name=table_name))

        # We dump the table schema information into a vector index. The vector index is stored
        # within the context builder for future use.
        index = ObjectIndex.from_objects(schema, nodes, VectorStoreIndex)
        llm = OpenAI(temperature=0.9, model_name="gpt-3.5-turbo")
        llm_predictor = LLMPredictor(llm=llm)
        service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
        # We construct a SQLTableRetrieverQueryEngine.
        # Note that we pass in the ObjectRetriever so that we can dynamically retrieve the table during query-time.
        # ObjectRetriever: A retriever that retrieves a set of query engine tools.
        query_engine = SQLTableRetrieverQueryEngine(
            sql_database,
            index.as_retriever(similarity_top_k=1),
            service_context=service_context,
        )

        return query_engine

`


Solution

  • This seems to be working for now.

        for table_name in metadata_obj.tables.keys():
            schema.append(
                SQLTableSchema(
                    table_name=table_name,
                    context_str=f"(note) you will need to typecast the values when performing a sum on the data field. The data field has the structure: {metadata_json}",
                )
            )