Search code examples
large-language-modelamazon-bedrock

How to pass max_token_to_sample parameter when using boto3 to access AWS bedrock model with Knowledgebase


I have this piece of code working to access AWS Bedrock models with a knowledge base:

    aws_session = boto3.Session(
    bedrock_agent_client = aws_session.client(service_name="bedrock-agent-runtime", region_name="us-west-2")
    response = bedrock_agent_client.retrieve_and_generate(
        input={"text": input_data},
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {"knowledgeBaseId": config.bedrock.kb_id, "modelArn": model_arn},
        },
    )
    return response

However it uses default max_token_to_sample parameter which is rather small. boto3 client retrieve_and_genenerate function does not seems to have a parameter or relevant config to specify it. Does anybody know how can I pass in this parameter? Thanks!


Solution

  • It's not 100% clear what you mean by max_token_to_sample but I think you're referring to inference. If so, it's the textInferenceConfig you're looking for.

    Beware more tokens == more cost.

    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            "text": prompt
        },
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "generationConfiguration":{
                    "inferenceConfig": {
                        "textInferenceConfig": {"maxTokens": 123}
                    }
                },
                "knowledgeBaseId": kbId,
                "modelArn": model_arn,
            }
        }
    )
    

    Reference: AgentsforBedrockRuntime / Client / retrieve_and_generate