GLM-4 and MiniMax M2.5 Pricing (2026): API Costs, Models, and Comparison
Two of the most cost-effective AI model families right now come from Chinese labs: Zhipu AI’s GLM series and MiniMax’s M2.5. Both offer frontier-level performance at a fraction of what you would pay for Claude or GPT. Here is what they actually cost as of February 2026.
If you are comparing AI model costs more broadly, our Claude AI pricing guide covers Anthropic’s full lineup including the new Opus 4.6.
GLM-4 series pricing
Zhipu AI (the company behind GLM models) operates through their Z.AI platform. The “GLM-4” family has expanded significantly, with the latest models being GLM-5, GLM-4.7, and several GLM-4.5 variants.
All prices below are per 1 million tokens, in USD.
| Model | Input | Cached input | Output | Context window |
|---|---|---|---|---|
| GLM-5 | $1.00 | $0.20 | $3.20 | 128K |
| GLM-5-Code | $1.20 | $0.30 | $5.00 | 128K |
| GLM-4.7 | $0.60 | $0.11 | $2.20 | 200K |
| GLM-4.5 | $0.60 | $0.11 | $2.20 | 128K |
| GLM-4.5-X | $2.20 | $0.45 | $8.90 | 128K |
| GLM-4.5-Air | $0.20 | $0.03 | $1.10 | 128K |
| GLM-4.7-Flash | Free | Free | Free | 128K |
| GLM-4.5-Flash | Free | Free | Free | 128K |
The standout here: GLM-4.7-Flash and GLM-4.5-Flash are completely free. No catches, no rate-limit-per-day quota. Zhipu offers these as loss leaders to build adoption.
GLM-4.7 is the flagship open-source model. It scores 73.8% on SWE-bench Verified and 84.9 on LiveCodeBench V6, putting it ahead of Claude Sonnet 4.5 on coding benchmarks. The 200K context window and 128K output capacity make it competitive with models costing 5 to 10 times more.
Cached input pricing deserves attention. If you are building applications that send the same system prompts or reference documents repeatedly, GLM-4.7’s cached input rate of $0.11/MTok is roughly 80% cheaper than the standard input price.
MiniMax M2.5 pricing
MiniMax released M2.5 on February 12, 2026, and it immediately turned heads. The model comes in two versions: Standard and Lightning.
| Model | Input | Output | Context window | Speed |
|---|---|---|---|---|
| M2.5 Standard | $0.30/MTok | $1.20/MTok | 1M tokens | 50 tok/s |
| M2.5 Lightning | $0.30/MTok | $2.40/MTok | 200K tokens | 100 tok/s |
At $0.30 input / $1.20 output per million tokens, M2.5 Standard costs 1/63rd the price of Claude Opus 4.6 ($5/$25). That is not a typo.
M2.5 scores 80.2% on SWE-Bench Verified, which puts it at or near the top of public benchmarks as of mid-February 2026. It also hits 51.3% on Multi-SWE-Bench and 76.3% on BrowseComp with context management.
The Lightning version doubles the output price but also doubles throughput to 100 tokens per second. MiniMax claims you can run the model continuously for an hour at 100 tok/s for about $1.
M2.5 is open-source and available on Hugging Face, so you can also self-host it if you have the hardware.
Comparison with Claude and GPT
How do these models compare to the big Western providers? Here is a side-by-side on price and specs.
| Model | Input/MTok | Output/MTok | Context | SWE-bench |
|---|---|---|---|---|
| MiniMax M2.5 | $0.30 | $1.20 | 1M | 80.2% |
| GLM-4.7 | $0.60 | $2.20 | 200K | 73.8% |
| GLM-5 | $1.00 | $3.20 | 128K | N/A |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K | ~70% |
| Claude Opus 4.6 | $5.00 | $25.00 | 200K | ~79% |
| GPT-4o | $2.50 | $10.00 | 128K | ~65% |
The price gap is striking. For the same $100 budget:
- MiniMax M2.5: ~83 million output tokens
- GLM-4.7: ~45 million output tokens
- Claude Sonnet 4.5: ~6.7 million output tokens
- Claude Opus 4.6: ~4 million output tokens
That is a 20x cost advantage for M2.5 over Opus 4.6 at comparable benchmark performance.
Which model should you pick?
Pick GLM-4.7-Flash if you want a free tier for prototyping or low-volume usage. It is genuinely free with no hidden quotas, which makes it perfect for testing ideas before committing to a paid model.
Pick GLM-4.7 or GLM-5 if you need a strong general-purpose model at moderate cost. The $0.60/$2.20 pricing sits well below Western alternatives, and the 200K context window handles long documents.
Pick MiniMax M2.5 Standard if cost is your primary concern and you need frontier performance. At $0.30/$1.20, it is the cheapest model in its performance tier by a wide margin. The 1M token context window is also the largest available at this price point.
Pick MiniMax M2.5 Lightning if you need speed for real-time applications. The 100 tok/s throughput at $0.30/$2.40 is competitive for interactive coding agents and live chat applications.
Stick with Claude or GPT if you need maximum reliability, established ecosystem support, or specific features like Claude’s computer use or GPT’s image generation. The premium pricing comes with mature tooling, extensive documentation, and proven production stability.
Things to watch out for
Rate limits vary. Both Zhipu and MiniMax may apply different rate limits than what you get with OpenAI or Anthropic. Check the platform docs before planning high-volume deployments.
Latency from outside Asia. Both services run primarily from Chinese data centers. If your users are in North America or Europe, expect higher latency than with US-based providers. MiniMax offers some international endpoints, and GLM models are available through third-party providers like Fireworks and Novita.
Benchmark scores do not tell the whole story. M2.5’s 80.2% SWE-bench score is impressive, but real-world coding performance depends on your specific use case. Test with your actual workloads before committing.
Vision and multimodal costs differ. GLM-4.6V (the vision model) costs $0.30/$0.90 per MTok. MiniMax M2.5 currently supports text only. If you need image understanding, factor that into your comparison.
Using AI with your recordings
If you are processing meeting recordings or video content, per-token API costs can add up fast. A one-hour meeting transcript runs 10,000 to 15,000 tokens. Analyzing 100 meetings per month at Claude Opus rates ($5/$25 MTok) costs around $5 to $10 in API fees.
With ScreenApp, AI analysis is built into the platform. You get automatic transcription, AI summaries, and multi-language support without managing API keys or worrying about token budgets.
FAQ
How much does GLM-4 cost?
The GLM-4 family ranges from free (GLM-4.7-Flash, GLM-4.5-Flash) to $2.20 per million output tokens (GLM-4.7, GLM-4.5). The premium GLM-4.5-X model costs $8.90 per million output tokens. All pricing is through Zhipu AI’s Z.AI platform.
What is MiniMax M2.5 pricing?
MiniMax M2.5 Standard costs $0.30 per million input tokens and $1.20 per million output tokens. The Lightning version costs $0.30 input and $2.40 output per million tokens, with double the speed.
Is GLM-4 cheaper than ChatGPT?
Yes, significantly. GLM-4.7 costs $0.60/$2.20 per million tokens compared to GPT-4o at $2.50/$10.00. The Flash variants are completely free. Even GLM-5, the most expensive option, undercuts GPT-4o on both input and output pricing.
Is MiniMax M2.5 as good as Claude Opus?
On coding benchmarks, M2.5 scores 80.2% on SWE-Bench Verified versus Opus 4.6’s approximately 79%. Performance on other tasks varies. M2.5 costs roughly 1/20th of Opus 4.6, making it worth testing for your specific use case.
Can I self-host GLM-4 or MiniMax M2.5?
Both model families are open-source. GLM-4.7 is available on Hugging Face from Zhipu AI (zai-org), and MiniMax M2.5 is available from MiniMaxAI on Hugging Face. Self-hosting requires significant GPU resources but eliminates per-token costs entirely.
What context window does MiniMax M2.5 support?
M2.5 Standard supports up to 1 million tokens of context. The Lightning version supports 200K tokens. Both can output up to 131K tokens per response.
FAQ
The GLM-4 family ranges from free (GLM-4.7-Flash, GLM-4.5-Flash) to $2.20 per million output tokens (GLM-4.7, GLM-4.5). The premium GLM-4.5-X model costs $8.90 per million output tokens. All pricing is through Zhipu AI's Z.AI platform.
MiniMax M2.5 Standard costs $0.30 per million input tokens and $1.20 per million output tokens. The Lightning version costs $0.30 input and $2.40 output per million tokens, with double the speed.
Yes, significantly. GLM-4.7 costs $0.60/$2.20 per million tokens compared to GPT-4o at $2.50/$10.00. The Flash variants are completely free. Even GLM-5, the most expensive option, undercuts GPT-4o on both input and output pricing.
On coding benchmarks, M2.5 scores 80.2% on SWE-Bench Verified versus Opus 4.6's approximately 79%. Performance on other tasks varies. M2.5 costs roughly 1/20th of Opus 4.6, making it worth testing for your specific use case.
Both model families are open-source. GLM-4.7 is available on Hugging Face from Zhipu AI (zai-org), and MiniMax M2.5 is available from MiniMaxAI on Hugging Face. Self-hosting requires significant GPU resources but eliminates per-token costs entirely.
M2.5 Standard supports up to 1 million tokens of context. The Lightning version supports 200K tokens. Both can output up to 131K tokens per response.