13 New Models. 2 New Labs. A Smarter Auto Mode on Every Tier.
13 New Models. 2 New Labs. A Smarter Auto Mode on Every Tier.
When we launched Eco Mode last month, ARMES connected you to 35 models from 13 AI labs. Today, the catalog has grown to 48+ models from 15 AI labs — and Auto Mode just got significantly smarter on every tier.
This is the Spring 2026 update. Two new labs joined the platform. Thirteen new models shipped — including newly released frontier flagships from Anthropic, OpenAI, and Moonshot. And every tier's Auto Mode has been re-strategized around the new lineup, with a brand-new routing branch built specifically for low-hallucination web research.
The best part: there's no setting to flip. Open ARMES, ask a question, and the new routing kicks in.
Here's what changed.
At a Glance
- 13 new models added across 8 labs
- 2 brand-new labs — Alibaba (Qwen) and Xiaomi (MiMo)
- 4 refreshed Auto Modes — Free, Eco, Pro, and Ultra all updated
- A new Research and Synthesis routing branch in Eco, Pro, and Ultra that pairs Moonshot's Kimi K2.6 with the ARMES web search tool for low-hallucination fact-finding
- Frontier-class flagships newly available: Claude Opus 4.7, GPT-5.5, DeepSeek V4 Pro, Qwen 3.6 Max Preview, Kimi K2.6, MiMo V2.5 Pro
- Same privacy guarantee on every model — every request still routes through Zero Data Retention infrastructure. Providers process and immediately forget.
Two New AI Labs
Every lab in ARMES brings a distinct strength. Two new ones joined the roster this update.
Alibaba (Qwen) — The Polyglots
Alibaba's Qwen team builds one of the most capable multilingual model families in production. The Qwen 3.6 line spans a proprietary flagship (Max Preview) and an open-weights MoE counterpart (Flash) — both engineered for cross-lingual mastery, agentic coding, and multimodal reasoning.
Why "The Polyglots": Qwen's superpower is doing world-class work across languages — including code-switched contexts where most models stumble. Qwen 3.6 Max Preview claims #1 on six different coding benchmarks simultaneously. Qwen 3.6 Flash brings frontier-class MoE coding to consumer-deployable hardware under Apache-2.0.
Both Qwen models are available for manual selection on Pro and Ultra.
Xiaomi (MiMo) — The Omnimodals
Xiaomi's MiMo series is the company's entry into the global frontier-LLM race. The V2.5 generation pairs a 1T-parameter reasoning model (V2.5 Pro) with a high-throughput sibling (base V2.5), both shipping with 1M-token context windows. Under the hood, the architecture is fully omnimodal — text, image, audio, and video unified in a single model.
Why "The Omnimodals": No other lab in the ARMES catalog ships frontier models built natively as omnimodal. Inside chat, MiMo V2.5 Pro shines for agentic coding (SWE-Bench Pro 57.2%), long-horizon tool use (1,000+ tool calls in a single workflow), and token economy — 42% fewer tokens than Kimi K2.6 for equivalent performance. For long, multi-turn coding sessions, that's a real cost advantage.
Both MiMo models are available for manual selection on Pro and Ultra.
The New Frontier Flagships
Three names you should know.
Claude Opus 4.7 (Ultra)
Anthropic's newest flagship — the successor to Opus 4.6 at the same price point. Higher instruction fidelity, sharper creative quality, and more reliable conceptual depth. 1M-token context window.
Best at: Creative writing where voice and craft matter, complex UI/UX coding, architectural decisions, production debugging, strategic advising, and continuations of complex prior work. The flagship choice when stakes are high and quality cannot be compromised.
For Ultra users: Opus 4.7 is the new anchor of Ultra Mode's Auto routing. It replaces Opus 4.6 in the routing pool — same price, newer capability. Every "continue," "yes, proceed," high-stakes creative project, or open-ended strategic conversation now flows through the latest Anthropic flagship.
GPT-5.5 (Ultra)
OpenAI's smartest model to date. Co-designed for NVIDIA GB200/GB300 NVL72 systems with a 1M-token context window, configurable thinking modes, and targeted safeguards for cybersecurity and biology workloads.
Best at: Peak agentic coding (Terminal-Bench 2.0: 82.7%), professional knowledge work (GDPval: 84.9%), graduate science (GPQA Diamond: 93.6%), and frontier mathematics (FrontierMath Pro: 39.6%). Delivers a 23% improvement in factual correctness over GPT-5.4.
For Ultra users: Auto Mode now reserves GPT-5.5 for PhD-level mathematics, formal proofs, advanced science, and deep security audits — work where peak correctness matters more than speed or cost.
Kimi K2.6 (Eco, Pro, Ultra)
Moonshot's latest. Same 1T-parameter MoE backbone as K2.5 — but with native INT4 quantization-aware training, native multimodal input, an expanded agent-swarm runtime that orchestrates 300 sub-agents across 4,000 coordinated steps, and the headline number: a hallucination rate that dropped from 65% to 39% generation-over-generation. That's the biggest single-generation reliability gain Moonshot has shipped.
Best at: Web-driven research, multi-source synthesis, low-hallucination factual reasoning, multilingual analysis, and complex conceptual work. K2.6 leads HLE-Full with Tools at 54.0 — the #1 score on that benchmark.
For ARMES users: K2.6 is the centerpiece of the new Auto Mode strategy. More on that below.
Other New Models Worth Knowing
A quick rundown of the rest:
- GPT-5.4 Mini & Nano (Pro + Ultra Manual) — OpenAI's cost-tier siblings. Mini brings GPT-5.4's reasoning at a budget footprint. Nano is the lightweight, lowest-latency option in the GPT-5.4 line.
- DeepSeek V4 Pro & V4 Flash (Pro + Ultra Manual) — The latest DeepSeek MoE generation. V4 Pro is the #2 highest-performing open-weights reasoning model in the world. V4 Flash is now the cheapest model in the entire ARMES catalog at $0.14 / $0.28 per million tokens.
- Z.ai GLM-5.1 (Pro + Ultra Manual) — The next evolution of GLM. Refined agentic coding, low hallucination, and stronger deep-planning patterns.
- MiniMax M2.7 (Free + Eco Auto, Pro + Ultra Manual) — MiniMax's first self-evolving model. Replaces M2.5 in Free and Eco Auto Mode, delivering near-frontier coding quality at budget-tier pricing.
That's 13 new models, every one available for manual selection on the tiers listed above. Some also got picked up by Auto Mode — that's where things get interesting.
Auto Mode: Refreshed on Every Tier
Auto Mode is the brain that picks the right model for every message. Every tier's Auto Mode has been re-strategized around the new lineup.
Free — Refreshed router language
Pool: DeepSeek V3.2 · Llama 4 Maverick · MiniMax M2.7 · Gemini 3.1 Flash Lite
The Free pool was already strong. No model swaps — but the routing logic, capability descriptions, and example queries were all tightened. MiniMax M2.7 took over from M2.5, lifting the coding ceiling without touching the price. Same intelligence, cleaner decisions.
Eco — Kimi K2.5 → K2.6, plus a brand-new Research and Synthesis branch
Pool: DeepSeek V3.2 · Gemini 3 Flash · Llama 4 Maverick · MiniMax M2.7 · Kimi K2.6 (new)
K2.5 retired in favor of K2.6 — same 1T MoE backbone, but with a hallucination rate cut nearly in half. A new Research and Synthesis routing branch sends web-search-driven queries to K2.6, where its low-hallucination tool use produces the most reliable synthesis in the Eco pool. Translation and multilingual work also shifted to K2.6.
Pro — A genuinely new capability
Pool: Claude Sonnet 4.6 · Gemini 3 Flash · Gemini 3.1 Pro · GPT-5.4 · Kimi K2.6 (new)
Sonnet 4.6 is still the trust anchor — the default for any query that doesn't clearly benefit from a specialist's strength. But for queries like "research the competitive landscape and tell me what stands out" or "look up the latest on X," Auto Mode now sends them to K2.6 with the web search tool. K2.6's 39% hallucination rate beats Sonnet's at synthesizing search results faithfully. Translation and multilingual work also moved to K2.6.
Ultra — Two flagship anchors swapped, K2.6 added
Pool: Claude Opus 4.7 (new) · Claude Sonnet 4.6 · Claude Haiku 4.5 · Gemini 3.1 Pro · GPT-5.5 (new) · Kimi K2.6 (new)
Two flagship swaps and one specialist add: Opus 4.6 → Opus 4.7 (same price, newer model), GPT-5.4 → GPT-5.5 (peak intelligence with 23% better factual correctness), and K2.6 added as the dedicated research-and-synthesis specialist. The same Research and Synthesis branch that lives in Eco and Pro now lives in Ultra.
The pool grew from 5 to 6 to give Ultra users the same low-hallucination research experience without compromising the flagship anchor for everything else. Anthropic anchors everyday work. GPT-5.5 is the heavy weapon for hard math, science, and security. Gemini 3.1 Pro is the abstract-reasoning and large-context champion. K2.6 is the trust anchor for web-grounded research and multilingual translation. Haiku 4.5 handles trivial speed-sensitive queries.
The New Research and Synthesis Branch
This is the headline upgrade. In Eco, Pro, and Ultra, when you ask Auto Mode to look something up and synthesize what it finds, the router now sends that work to Kimi K2.6 with the web search tool.
Why this matters: when most models look something up, they confabulate around the search results — paraphrasing what's actually on the page with what they think should be there. K2.6's 39% hallucination rate is the lowest of any model in any of our Auto Mode pools, and its tool use is among the strongest in the industry. Pair that with live web search, and you get fast, faithful, cited synthesis.
Try it. Ask: "Look up the latest on X and tell me what stands out." Or: "Research [a competitor] and summarize their positioning." Or: "What's the current state of [a fast-moving topic] — and what changed this week?"
In Eco, Pro, and Ultra, you now get a first-class workflow for that kind of research. Web-grounded. Low-hallucination. Fast.
What This Costs You
Nothing extra. Tier pricing is unchanged.
- Free — $0/month forever. Same models, sharper routing.
- Pro — $19/month launch price ($25/mo regular). Auto Mode now includes K2.6 for research and translation. Same plan, smarter routing.
- Ultra — $200/month. New flagship anchors (Opus 4.7, GPT-5.5) at the same tier price. Plus the new Research and Synthesis branch.
Opus 4.7 launched at the same price as Opus 4.6. Adding K2.6 in Pro and Ultra Auto Mode actually pulls the average per-response cost down on research queries, since K2.6 is cheaper than Sonnet. We're routing your existing budget to better models — not raising the price.
Same Privacy. Every Lab. Every Tier.
Every model in this update — every new flagship, every new lab, every Auto Mode pool — runs through the same Zero Data Retention infrastructure as the rest of ARMES. AI providers process your message in memory and immediately forget it. Nothing is stored, profiled, or used for training.
No ads. No data retention. No training on your conversations.
That's the line, and it's the same line for Anthropic, OpenAI, Google, Moonshot, Alibaba, Xiaomi, DeepSeek, Z.ai, MiniMax, Meta, and every other lab in the catalog. Two new labs joined this update. The privacy guarantee didn't change.
How to Use Any of This
You don't have to do anything. Open ARMES, pick Auto Mode (or Eco Mode), and ask a question. The new routing kicks in automatically.
If you want to drive manually, every new model is available in the model selector on its tier — Qwen 3.6 Max Preview, Qwen 3.6 Flash, MiMo V2.5 Pro, MiMo V2.5, DeepSeek V4 Pro, DeepSeek V4 Flash, GLM-5.1, GPT-5.4 Mini, GPT-5.4 Nano, plus Opus 4.7, GPT-5.5, and Kimi K2.6 in Auto. Switch labs mid-conversation. Your context carries forward.
The Bigger Picture
ARMES is now 48+ models from 15 AI labs — the broadest private AI roster in any chat platform we know of. Frontier flagships and open-weights specialists. Western labs and Eastern labs. Coding specialists, reasoning specialists, research specialists, and multilingual specialists. All routed through Zero Data Retention.
The frontier isn't one model anymore. It's an ecosystem. Auto Mode just got smarter at pulling from that ecosystem on your behalf — and a single, brand-new routing branch gives every paid tier a faithful, web-grounded research experience that genuinely competes with anything else on the market.
Open ARMES. Ask something interesting.
Joseph Founder, ARMES