To start, I’d like to explain what I aim to achieve. My goal is to create an AI bot that will act as a hotel assistant, able to provide users with any hotel-related information they request. It should answer based on user input, with the capability to check room availability and pricing for specific dates.
I’ve already implemented a room availability feature. This involves a Kernel function that takes checkInDate, checkOutDate, and a list of room objects as parameters (for instance, two rooms, one for 1 adult and another for 2 adults and a child of 6 years).
This part works well. For example, when I ask, "Is there a room available for 2 adults from November 7th to 14th?" it provides accurate information based on API connectivity through the plugin.
The issue arises with the Kernel function’s output, which is currently large and complex due to the detailed schema, containing extensive package and room data. Processing this consumes a lot of tokens, and I’m encountering an error 429
:
Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 50 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.
To reduce the token load, I thought about splitting the response into two parts a) dynamic data (so for example price) b) Static (or infrequently changing) data (like package descriptions, room descriptions, etc.).
This approach would involve combining the Kernel function with a vector database to store descriptions of rooms and packages, as these don’t change often.
The second goal is to enable the bot to "learn" about the hotel to answer questions about the hotel (like check-in time, pool availability, parking options, etc.).
I could use a fine-tuned model for this, but a vector database that’s periodically updated might also work well. However, since I’ve never worked with a vector database before, I’m unfamiliar with how to load data into it or integrate it with my current use case.
So, my main question is: how can I effectively combine the Kernel function with this vector database?
What i have currently is taken from: https://github.com/PieEatingNinjas/copilot-semantickernel/tree/demo and adjusted to my use case (getting information about hotel rates)
var builder = Kernel.CreateBuilder();
builder.Services.AddAzureOpenAIChatCompletion(
"gpt-4o-mini",
Secrets.AzureOpenAiEndpoint,
Secrets.AzureOpenAiApiKey
);
builder.Plugins.AddFromType<AvailabilityPlugin>();
builder.Plugins.AddFromType<DateHelper>();
Kernel kernel = builder.Build();
IChatCompletionService chatCompletionService =
kernel.GetRequiredService<IChatCompletionService>();
var systemMessage = HotelBotCore.MainMessage;
/*
Here message is something like: "You are friendly bot that is serving information about hotel. People may ask information about hote rates. To check those you have to have information about check in date, check out date and information about rooms and people in those rooms. When presenting those rates show information about package name, room type name, price and meals [And so on, and so on]
*/
var chatMessages = new ChatHistory(systemMessage);
Console.WriteLine("--- Hotel info ---");
Console.WriteLine("Bot > How may i help you");
// Start the conversation
while (true)
{
// Get user input
Console.Write("User > ");
chatMessages.AddUserMessage(Console.ReadLine()!);
// Get the chat completions
var result =
chatCompletionService.GetStreamingChatMessageContentsAsync(
chatMessages,
executionSettings: new OpenAIPromptExecutionSettings()
{
ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions,
Temperature = 0.2
},
kernel: kernel);
// Stream the results
string fullMessage = "";
Console.Write("Bot > ");
await foreach (var content in result)
{
Console.Write(content.Content);
fullMessage += content.Content;
}
Console.WriteLine();
// Add the message from the agent to the chat history
chatMessages.AddAssistantMessage(fullMessage);
}
Just to add a bit more context on the data. Lets imagine that the kernel function in first version was returning data in this form (of course originally there is much more fields):
{
"PackageId": 1
"PackageName": "Package 1"
"PackageDescription": "Some long description"
"PackageRooms":
[
{
"RoomId": 1
"RoomDescription": "Long description"
"RoomPrice": 123
},
{
"RoomId": 2
"RoomDescription": "Another long description"
"RoomPrice": 456
}
]
}
i changed the model to be simpler:
{
"PackageId": 1
"PackageName": "Package 1"
"PackageRooms":
[
{
"RoomId": 1
"RoomPrice": 123
},
{
"RoomId": 2
"RoomPrice": 456
}
]
}
And now i would expect that this AI Bot would fetch data about those Rooms from vector database when presenting the results. is that possible? If not, i would ask for advice how to overcome this token issue that i am observing.
But even if the vector database is not the anwser for the first issue presented above, probably it is an anwser for the second problem (so providing additional information about the hotel itself when asked) So that if someone would ask:
At what time do I need to check out of my room?
This information would be taken from some vector database i would prepare "somehow" (still would need advice what are my options) Because for example i saw Vector Stores in Azure OpenAI studio.
You can upload there some PDF. But i have no clue how to connect it to my code
You can use Microsoft Kernel Memory to connect your code to Vector database. Here is anarticle about Kernel Memory.
However for your use case I believe the reason for hitting limits is due to keeping the function call outputs in history. Please check if they are getting added to chat history.