Kimi K2: Open Agentic Intelligence — Moonshot AI's Latest Breakthrough

Kimi K2: Open Agentic Intelligence — Moonshot AI’s Latest Breakthrough

Moonshot AI has released Kimi K2, a landmark large language model that pushes the boundaries of agentic intelligence—the ability of AI systems to autonomously perceive, plan, reason, and act within complex, dynamic environments. Published in the paper Kimi K2: Open Agentic Intelligence (arXiv:2507.20534), the model represents a significant step toward the next generation of foundation models.

Key Highlights

The Shift to Agentic Intelligence

Traditional LLMs learn from static, human-generated data. Agentic intelligence marks a paradigm shift: models that learn through interactions, acquire skills beyond their training distribution, and adapt through experience. This approach allows AI agents to go beyond the limits of static data and develop superhuman capabilities through exploration and exploitation.

Achieving this requires advances in both pre-training (broad general-purpose priors with high token efficiency) and post-training (scalable synthesis of agentic trajectories and reinforcement learning).

MuonClip: Stable Training at Scale

A core technical contribution is MuonClip, a novel optimizer that addresses a critical challenge when scaling the token-efficient Muon algorithm: training instability due to exploding attention logits.

The Problem

As Muon scales, attention logits can rapidly exceed magnitudes of 1000, leading to loss spikes and occasional divergence. Existing mitigations (logit soft-cap, QK-Norm) were insufficient for Multi-head Latent Attention (MLA) architectures.

The Solution: QK-Clip

QK-Clip rescales query and key projection weights whenever the maximum attention logit exceeds a threshold τ. The mechanism:

With τ=100, Kimi K2 trained on 15.5T tokens with no observable loss spikes, validating MuonClip’s effectiveness at scale.

Pre-Training: Token Efficiency and Data Rephrasing

Kimi K2 was pre-trained on 15.5 trillion tokens across Web Text, Code, Mathematics, and Knowledge. Key innovations:

Synthetic Rephrasing

To improve token utility without overfitting, Moonshot introduced domain-specific rephrasing:

Experiments showed rephrased data consistently outperformed multi-epoch repetition on SimpleQA.

Model Architecture

ParameterKimi K2DeepSeek-V3
Total Parameters1.04T671B
Activated Parameters32.6B37B
Experts384256
Attention Heads64128

Kimi K2 increases sparsity (384 experts, 8 active per token) for better performance while reducing attention heads to improve inference efficiency at long context lengths (e.g., 128K).

Post-Training: Agentic Data Synthesis and RL

Large-Scale Agentic Data Synthesis

A critical capability is autonomous tool use—using unfamiliar tools, interacting with environments, and iteratively refining actions. Moonshot built a comprehensive pipeline:

  1. Tool spec generation: 3000+ real MCP tools + 20,000+ synthetic tools via hierarchical domain evolution
  2. Agent and task generation: Diverse agents with different tool combinations and rubric-based tasks
  3. Trajectory generation: Multi-turn dialogues with user simulation and tool execution environments
  4. Quality filtering: LLM-based judges retain only trajectories meeting success criteria
  5. Hybrid approach: Real execution sandboxes for coding tasks to ground learning in authentic feedback

Reinforcement Learning

K2 extends RL with:

Benchmark Results

Kimi K2 achieves state-of-the-art among open-source non-thinking models:

BenchmarkKimi K2Best Open Baseline
Tau2-Bench66.1
ACEBench (En)76.5
SWE-Bench Verified65.8
LiveCodeBench v653.746.9 (DeepSeek-V3)
OJBench27.124.0 (DeepSeek-V3)
AIME 202549.5
GPQA-Diamond75.1

On the LMSYS Arena (July 17, 2025), Kimi K2 ranks #1 among open-source models and #5 overall based on over 3,000 user votes.

Implications and Open Source

Kimi K2 demonstrates that agentic intelligence can be achieved through:

  1. Stable, token-efficient pre-training (MuonClip, rephrasing)
  2. Scalable agentic data synthesis (simulation + real execution)
  3. General RL frameworks (verifiable rewards + self-critique)

By open-sourcing base and post-trained checkpoints, Moonshot enables the community to explore, refine, and deploy agentic intelligence at scale. The paper and models are available at:


This article summarizes the technical report “Kimi K2: Open Agentic Intelligence” by the Kimi Team at Moonshot AI. All benchmark data and technical details are from the original paper.

Kimi K2Moonshot AIagentic intelligenceLLMMixture of ExpertsMuonClipopen source AI