The high-efficiency Flash model for real-world agents
196B sparse MoE · 11B active parameters per token · 256K context · native image & video understanding.
97% of Claude Opus 4.6 coding performance. 1/9th the cost.
Step 3.7 Flash drives the agentic loop end-to-end — calling tools, reading results, iterating — and escalates to a frontier advisor model only at the few inflection points where it falls short: planning and recovery from repeated failures. On SWE-Bench Verified with Advisor Mode enabled: $0.19 per task vs $1.76 for Claude Opus 4.6.
Built for production agents
Native multimodal
Understands images, video, charts, and documents natively. No separate vision model needed inside your agent framework.
Agentic coding
SWE-Bench Pro 56.3 · Terminal-Bench 2.1 59.6. Traces multi-file repos, isolates bugs from raw issue reports, and generates patches that pass automated unit tests.
Reliable tool use & orchestration
#1 open model on ClawEval-1.1 (67.1). Multi-step workflows stay coherent across arbitrarily long runs. Less drift, fewer broken tool calls.
Adaptive reasoning
Three effort levels — low / medium / high — trade reasoning depth for speed. Set via reasoning_effort (OpenAI-compatible API) or output_config.effort (Anthropic-compatible API).
Works with your stack
Step 3.7 Flash integrates with all major coding and agent platforms out of the box — no workflow rewiring required.
Two API flavors, one model
from openai import OpenAI
client = OpenAI(
base_url="https://api.stepfun.com/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="step-3.7-flash",
reasoning_effort="medium", # low | medium | high
messages=[{"role": "user", "content": "Explain MoE in one paragraph."}],
)
print(response.choices[0].message.content)import anthropic
client = anthropic.Anthropic(
base_url="https://api.stepfun.com",
api_key="YOUR_API_KEY",
)
message = client.messages.create(
model="step-3.7-flash",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain MoE in one paragraph."}],
)
print(message.content[0].text)Already using the OpenAI or Anthropic SDK? Just swap the base_url and model — no other code changes needed.
Benchmarks
Measured against leading open and closed models. Step 3.7 Flash leads all open-weight models on ClawEval-1.1 and SimpleVQA with Search.
| Benchmark | Step 3.7 Flash | DeepSeek V4 Flash | GPT 5.5 | Claude Opus 4.7 |
|---|---|---|---|---|
| SWE-Bench Pro | 56.3 | 55.6 | 58.6 | 64.3 |
| Terminal-Bench 2.1 | 59.6 | 62.0 | 82.7 | 69.4 |
| ClawEval-1.1 | 67.1 ★ | 57.8 | 60.3 | 70.8 |
| HLE w/ Tool | 47.2 | 45.1 | 52.2 | 54.7 |
| SimpleVQA w/ Search | 79.2 ★ | — | 79.1 | — |
| Toolathlon | 49.5 | 52.8 | 60.2 | 65.4 |
Source: StepFun official release, May 29 2026. DeepSeek V4 Flash figures from StepFun internal testing. Kimi K2.6 / GPT 5.5 / Claude Opus 4.7 from official reported results.
Deploy in minutes
pip install vllm
vllm serve "stepfun-ai/Step-3.7-Flash"
# Call via OpenAI-compatible endpoint
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "stepfun-ai/Step-3.7-Flash",
"messages": [{"role": "user", "content": "Hello"}]
}'pip install sglang
python3 -m sglang.launch_server \
--model-path "stepfun-ai/Step-3.7-Flash" \
--host 0.0.0.0 \
--port 30000
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "stepfun-ai/Step-3.7-Flash",
"messages": [{"role": "user", "content": "Hello"}]
}'# llama.cpp (requires ≥128 GB unified memory)
git clone https://github.com/ggml-org/llama.cpp
cmake -B build && cmake --build build --config Release
./build/bin/llama-server \
-m Step-3.7-flash-Q4_K_M.gguf \
--host 0.0.0.0 --port 8080
# Ollama (one-liner)
ollama run hf.co/stepfun-ai/Step-3.7-Flash-GGUF# Docker Model Runner (one-liner)
docker model run hf.co/stepfun-ai/Step-3.7-Flash
# SGLang via Docker (datacenter)
docker run --gpus all --shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<your-token>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "stepfun-ai/Step-3.7-Flash" \
--host 0.0.0.0 --port 30000Minimum requirements
Local: ≥128 GB unified memory (Mac Studio M3 Ultra, NVIDIA DGX Station, AMD Ryzen AI Max+ 395).
Cloud / datacenter: H100 / A100 via vLLM or NVIDIA NIM microservice.
Quantized (GGUF): browse variants at huggingface.co/stepfun-ai/Step-3.7-Flash-GGUF ↗
Pricing
Input (cache miss)
$0.20per 1M tokensInput (cache hit)
$0.04per 1M tokensOutput
$1.15per 1M tokens256K token context window · Apache 2.0 open weights
Frequently asked questions
What is Step 3.7 Flash?
Step 3.7 Flash is StepFun's open-weight 196B-parameter sparse Mixture-of-Experts vision-language model, released on May 29, 2026 under the Apache 2.0 license. It activates ~11B parameters per token, supports a 256K-token context window, and understands images and video natively.
How much does Step 3.7 Flash cost to use via API?
Input tokens cost $0.20 per million (cache miss) or $0.04 per million (cache hit). Output tokens cost $1.15 per million. The model weights are free to download and self-host under Apache 2.0.
How much memory do I need to run Step 3.7 Flash locally?
At least 128 GB of unified memory is required for local deployment — for example a Mac Studio M3 Ultra, NVIDIA DGX Station, or AMD Ryzen AI Max+ 395. For cloud or datacenter deployment use H100 / A100 nodes via vLLM, SGLang, or NVIDIA NIM. Quantized GGUF variants are available for lower-memory setups.
Is Step 3.7 Flash compatible with the OpenAI API?
Yes. Step 3.7 Flash exposes an OpenAI-compatible /v1/chat/completions endpoint. You only need to change base_url and model in your existing OpenAI SDK code.
Is Step 3.7 Flash compatible with the Anthropic API?
Yes. StepFun also exposes an Anthropic Messages-compatible endpoint, so you can reuse Anthropic SDK code by swapping base_url and model.
What is Advisor Mode?
Advisor Mode is StepFun's implementation of the advisor strategy described by Anthropic. Step 3.7 Flash handles the full agentic loop and escalates to a larger frontier model only at critical inflection points (planning, recovery from failures). This delivers 97% of Claude Opus 4.6's coding performance on SWE-Bench Verified at roughly 1/9th the per-task cost ($0.19 vs $1.76).
Which agent frameworks does Step 3.7 Flash support?
It works out of the box with Claude Code, Cline, Roo Code, KiloCode, OpenCode, Hermes Agent, and OpenClaw. No workflow rewiring is required.
What license is Step 3.7 Flash released under?
Apache 2.0 — free for commercial and non-commercial use.