Morph Models

Fast general models for agent loops

Run the primary agent loop on fast, OpenAI-compatible coding models served on Morph's custom kernels. One API for chat, code generation, and reasoning.
50/50 TESTS PASS
UPDATED
0 Tok/s
Morph Models

Frontier coding models, served on custom kernels

Output speed

Codegen-specific optimizations and custom GPU kernels. Up to 200 tok/s on Qwen 3.5 397B.

200Qwen 3.5397B150DSV4 Flash100Qwen 3.6 27B90MiniMax M2.7150DSV4 ProContact us →

One OpenAI-compatible API

Point your existing client at api.morphllm.com. Switch models by changing one string.

01
import OpenAI from "openai";
02
 
03
const client = new OpenAI({
04
  baseURL: "https://api.morphllm.com/v1",
05
  apiKey: process.env.MORPH_API_KEY,
06
});
07
 
08
const res = await client.chat.completions.create({
09
  model: "morph-qwen35-397b",
10
  messages: [{ role: "user", content: "Refactor this function..." }],
11
});

The lineup

Open-weight frontier models with long context, served and billed per token. No per-seat fees.

01
// Available general models
02
morph-qwen35-397b      // 397B MoE, 262k context
03
morph-minimax27-230b   // 230B MoE, agentic workflows
04
morph-dsv4flash        // 393k context, fast
05
morph-qwen36-27b       // dense, low latency

Built for production agent workloads




Inference optimized for coding agents

Every agent will write code. We bet the stack on it.

So we tune every layer for that one workload: custom GPU kernels, speculative decoding shaped around code, and serving built for the agent loop instead of general chat. Not general infrastructure with code bolted on.

Get $10 in free credits when you sign up today.

Get API Key

No credit card required. Pay only for what you use after that.