How to pass max_token_to_sample parameter when using boto3 to access AWS bedrock model with Knowledgebase

I have this piece of code working to access AWS Bedrock models with a knowledge base:

    aws_session = boto3.Session(
    bedrock_agent_client = aws_session.client(service_name="bedrock-agent-runtime", region_name="us-west-2")
    response = bedrock_agent_client.retrieve_and_generate(
        input={"text": input_data},
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {"knowledgeBaseId": config.bedrock.kb_id, "modelArn": model_arn},
        },
    )
    return response

However it uses default max_token_to_sample parameter which is rather small. boto3 client retrieve_and_genenerate function does not seems to have a parameter or relevant config to specify it. Does anybody know how can I pass in this parameter? Thanks!

Solution

It's not 100% clear what you mean by max_token_to_sample but I think you're referring to inference. If so, it's the textInferenceConfig you're looking for.

Beware more tokens == more cost.

response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": prompt
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "generationConfiguration":{
                "inferenceConfig": {
                    "textInferenceConfig": {"maxTokens": 123}
                }
            },
            "knowledgeBaseId": kbId,
            "modelArn": model_arn,
        }
    }
)

Reference: AgentsforBedrockRuntime / Client / retrieve_and_generate