Skip to main content
Open Source · Apache 2.0 · Released May 2026

The high-efficiency Flash model for real-world agents

196B sparse MoE · 11B active parameters per token · 256K context · native image & video understanding.

198B196B Params
11B11B Active
256K256K Context
400 tok/s400 tok/s
Advisor Mode

97% of Claude Opus 4.6 coding performance. 1/9th the cost.

Step 3.7 Flash drives the agentic loop end-to-end — calling tools, reading results, iterating — and escalates to a frontier advisor model only at the few inflection points where it falls short: planning and recovery from repeated failures. On SWE-Bench Verified with Advisor Mode enabled: $0.19 per task vs $1.76 for Claude Opus 4.6.

9× cheaper
Step 3.7 Flash$0.19 / task
Claude Opus 4.6$1.76 / task
SWE-Bench Verified Task Cost9× Lower Cost Baseline

Built for production agents

Native multimodal

Understands images, video, charts, and documents natively. No separate vision model needed inside your agent framework.

Agentic coding

SWE-Bench Pro 56.3 · Terminal-Bench 2.1 59.6. Traces multi-file repos, isolates bugs from raw issue reports, and generates patches that pass automated unit tests.

Reliable tool use & orchestration

#1 open model on ClawEval-1.1 (67.1). Multi-step workflows stay coherent across arbitrarily long runs. Less drift, fewer broken tool calls.

Adaptive reasoning

Three effort levels — low / medium / high — trade reasoning depth for speed. Set via reasoning_effort (OpenAI-compatible API) or output_config.effort (Anthropic-compatible API).

Works with your stack

Step 3.7 Flash integrates with all major coding and agent platforms out of the box — no workflow rewiring required.

Claude Code
Cline
Roo Code
KiloCode
OpenCode
Hermes Agent
OpenClaw
Also available as an NVIDIA NIM microservice.

Two API flavors, one model

OpenAI-compatible — python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.stepfun.com/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="step-3.7-flash",
    reasoning_effort="medium",   # low | medium | high
    messages=[{"role": "user", "content": "Explain MoE in one paragraph."}],
)
print(response.choices[0].message.content)
Anthropic-compatible — python
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.stepfun.com",
    api_key="YOUR_API_KEY",
)

message = client.messages.create(
    model="step-3.7-flash",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain MoE in one paragraph."}],
)
print(message.content[0].text)

Already using the OpenAI or Anthropic SDK? Just swap the base_url and model — no other code changes needed.

Benchmarks

Measured against leading open and closed models. Step 3.7 Flash leads all open-weight models on ClawEval-1.1 and SimpleVQA with Search.

BenchmarkStep 3.7 FlashDeepSeek V4 FlashGPT 5.5Claude Opus 4.7
SWE-Bench Pro56.355.658.664.3
Terminal-Bench 2.159.662.082.769.4
ClawEval-1.167.1 57.860.370.8
HLE w/ Tool47.245.152.254.7
SimpleVQA w/ Search79.2 79.1
Toolathlon49.552.860.265.4
★ = #1 open-weight model on that benchmark

Source: StepFun official release, May 29 2026. DeepSeek V4 Flash figures from StepFun internal testing. Kimi K2.6 / GPT 5.5 / Claude Opus 4.7 from official reported results.

Deploy in minutes

VLLM — bash
pip install vllm

vllm serve "stepfun-ai/Step-3.7-Flash"

# Call via OpenAI-compatible endpoint
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "stepfun-ai/Step-3.7-Flash",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Minimum requirements

Local: ≥128 GB unified memory (Mac Studio M3 Ultra, NVIDIA DGX Station, AMD Ryzen AI Max+ 395).

Cloud / datacenter: H100 / A100 via vLLM or NVIDIA NIM microservice.

Quantized (GGUF): browse variants at huggingface.co/stepfun-ai/Step-3.7-Flash-GGUF ↗

Pricing

Input (cache miss)

$0.20per 1M tokens
Optimized

Input (cache hit)

$0.04per 1M tokens

Output

$1.15per 1M tokens

256K token context window · Apache 2.0 open weights

Frequently asked questions

What is Step 3.7 Flash?

Step 3.7 Flash is StepFun's open-weight 196B-parameter sparse Mixture-of-Experts vision-language model, released on May 29, 2026 under the Apache 2.0 license. It activates ~11B parameters per token, supports a 256K-token context window, and understands images and video natively.

How much does Step 3.7 Flash cost to use via API?

Input tokens cost $0.20 per million (cache miss) or $0.04 per million (cache hit). Output tokens cost $1.15 per million. The model weights are free to download and self-host under Apache 2.0.

How much memory do I need to run Step 3.7 Flash locally?

At least 128 GB of unified memory is required for local deployment — for example a Mac Studio M3 Ultra, NVIDIA DGX Station, or AMD Ryzen AI Max+ 395. For cloud or datacenter deployment use H100 / A100 nodes via vLLM, SGLang, or NVIDIA NIM. Quantized GGUF variants are available for lower-memory setups.

Is Step 3.7 Flash compatible with the OpenAI API?

Yes. Step 3.7 Flash exposes an OpenAI-compatible /v1/chat/completions endpoint. You only need to change base_url and model in your existing OpenAI SDK code.

Is Step 3.7 Flash compatible with the Anthropic API?

Yes. StepFun also exposes an Anthropic Messages-compatible endpoint, so you can reuse Anthropic SDK code by swapping base_url and model.

What is Advisor Mode?

Advisor Mode is StepFun's implementation of the advisor strategy described by Anthropic. Step 3.7 Flash handles the full agentic loop and escalates to a larger frontier model only at critical inflection points (planning, recovery from failures). This delivers 97% of Claude Opus 4.6's coding performance on SWE-Bench Verified at roughly 1/9th the per-task cost ($0.19 vs $1.76).

Which agent frameworks does Step 3.7 Flash support?

It works out of the box with Claude Code, Cline, Roo Code, KiloCode, OpenCode, Hermes Agent, and OpenClaw. No workflow rewiring is required.

What license is Step 3.7 Flash released under?

Apache 2.0 — free for commercial and non-commercial use.