r/pallas

Files

Robert Helewka 4b954ed842 docs: add Claude Haiku 4.5 model card documentation

Add comprehensive model card for Anthropic's Claude Haiku 4.5 on AWS
Bedrock, including model details, capabilities, pricing, programmatic
access examples, and regional availability information.

2026-05-12 06:29:46 -04:00

14 KiB

Raw Blame History

MiniMax M2.5

MiniMax — MiniMax M2.5

Model Details

MiniMax M2.5 is an agent-native frontier model trained explicitly to reason efficiently, decompose tasks optimally, and complete complex workflows under real-world time and cost constraints. It achieves task completion speeds comparable to or faster than leading proprietary frontier models by combining high inference throughput with reinforcement learning focused on token-efficient reasoning and better decision-making in agentic scaffolds. For more information about model development and performance, see the model/service card.

Model launch date: Feb 12, 2026
Model EOL date: N/A
End User License Agreements and Terms of Use: View
Model lifecycle: Active
Context window: 196K tokens
Max output tokens: 8K

Input Modalities	Output Modalities	APIs supported	Endpoints supported
Audio	Embedding	Responses	bedrock-runtime
Image	Image	Chat Completions	bedrock-mantle
Speech	Speech	Invoke
Text	Text	Converse
Video	Video

Note
Whenever possible, we recommend you use the bedrock-mantle endpoint.

Capabilities and Features

Bedrock Features

Features supported using bedrock-mantle endpoint

Supported	Not Supported
See the AWS documentation website for more details	See the AWS documentation website for more details

Features supported using bedrock-runtime endpoint

Supported	Not Supported
See the AWS documentation website for more details	See the AWS documentation website for more details

Pricing

For pricing, please refer to the Amazon Bedrock Pricing page.

Programmatic Access

Use the following model IDs and endpoint URLs to access this model programmatically. For more information about the available APIs and endpoints, see APIs supported and Endpoints supported.

Endpoint	Model ID	In-Region endpoint URL	Geo inference ID	Global inference ID
bedrock-runtime	minimax.minimax-m2.5	https://bedrock-runtime.{region}.amazonaws.com	Not supported	Not supported
bedrock-mantle	minimax.minimax-m2.5	https://bedrock-mantle.{region}.api.aws/v1	Not supported	Not supported

For example, if region is us-east-1 (N. Virginia), then the bedrock-runtime endpoint URL will be "https://bedrock-runtime.us-east-1.amazonaws.com" and for bedrock-mantle will be "https://bedrock-mantle.us-east-1.api.aws/v1".

Service Tiers

Amazon Bedrock offers multiple service tiers to match your workload requirements. Standard provides pay-per-token access with no commitment. Priority offers higher throughput with a time-based commitment. Flex provides lower-cost access for flexible, non-time-sensitive workloads. Reserved provides dedicated throughput with a term commitment for predictable workloads. For more information, see service tiers.

Standard	Priority	Flex	Reserved

Regional Availability

Regional availability at a glance

Bedrock offers three inference options: In-Region keeps requests within a single Region for strict compliance, Geo Cross-Region routes across Regions within a geography (US, EU, etc.) for higher throughput while respecting data residency, and Global Cross-Region routes anywhere worldwide for maximum throughput when there are no residency constraints. Refer to the Regional availability page for more details.

Region	In-Region	Geo	Global
us-east-1 (N. Virginia)
us-east-2 (Ohio)
us-west-2 (Oregon)
eu-central-1 (Frankfurt)
eu-north-1 (Stockholm)
eu-south-1 (Milan)
eu-west-1 (Ireland)
eu-west-2 (London)
ap-northeast-1 (Tokyo)
ap-south-1 (Mumbai)
ap-southeast-2 (Sydney)
ap-southeast-3 (Jakarta)
sa-east-1 (São Paulo)
ap-southeast-4 (Melbourne)

Quotas and Limits

Your AWS account has default quotas to maintain the performance of the service and to ensure appropriate usage of Amazon Bedrock. The default quotas assigned to an account might be updated depending on regional factors, payment history, fraudulent usage, and/or approval of a quota increase request. For more details, please refer to Quotas for Amazon Bedrock documentation and see the limits for the model.

Sample Code

Step 1 - AWS Account: If you have an AWS account already, skip this step. If you are new to AWS, sign up for an AWS account.

Step 2 - API key: Go to the Amazon Bedrock console and generate a long-term API key.

Step 3 - Get the SDK: To use this getting started guide, you must have Python already installed. Then install the relevant software depending on the APIs you are using.

[ Chat Completions API ]

pip install boto3 openai

[ Invoke/Converse API ]

pip install boto3

Step 4 - Set environment variables: Configure your environment to use the API key for authentication.

[ Chat Completions API ]

OPENAI_API_KEY="<provide your Bedrock API key>"
OPENAI_BASE_URL="https://bedrock-mantle.<your-region>.api.aws/v1"

[ Invoke/Converse API ]

AWS_BEARER_TOKEN_BEDROCK="<provide your Bedrock API key>"

Step 5 - Run your first inference request: Save the file as bedrock-first-request.py

[ Chat Completions API ]

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="minimax.minimax-m2.5",
    messages=[{"role": "user", "content": "Can you explain the features of Amazon Bedrock?"}]
    )
print(response)

[ Invoke API ]

import json
import boto3

client = boto3.client('bedrock-runtime', region_name='us-east-1')
response = client.invoke_model(
    modelId='minimax.minimax-m2.5',
    body=json.dumps({
            'messages': [{ 'role': 'user', 'content': 'Can you explain the features of Amazon Bedrock?'}],
            'max_tokens': 1024
    })
)
print(json.loads(response['body'].read()))

[ Converse API ]

import boto3

client = boto3.client('bedrock-runtime', region_name='us-east-1')
response = client.converse(
    modelId='minimax.minimax-m2.5',
    messages=[
        {
            'role': 'user',
            'content': [{'text': 'Can you explain the features of Amazon Bedrock?'}]
        }
    ]
)
print(response)

14 KiB Raw Blame History

MiniMax M2.5

MiniMax — MiniMax M2.5

Model Details

Capabilities and Features

Pricing

Programmatic Access

Service Tiers

Regional Availability

Quotas and Limits

Sample Code

[ Chat Completions API ]

[ Invoke/Converse API ]

[ Chat Completions API ]

[ Invoke/Converse API ]

[ Chat Completions API ]

[ Invoke API ]

[ Converse API ]

14 KiB

Raw Blame History