Drop in. Don't rewrite.
The endpoint speaks OpenAI. Your existing client libraries work as-is. Change one URL. Nothing else changes in your code.
Onde Cloud is an OpenAI-compatible LLM inference API from Onde Inference. Server-side. Integrated with your existing Onde account and model catalog. Change one baseURL — your existing OpenAI SDK connects to it.
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://cloud.ondeinference.com/v1",
apiKey: "your-app-id:your-app-secret",
});Everything else stays the same. Same SDK. Same response shape. Same streaming. Works with the official openai npm package and the openai Python library.
OpenAI-compatible REST API
Onde Cloud accepts the same request format as the OpenAI chat completions API. If you ship with Onde on-device, you already know the model catalog and the account system. If you're evaluating Onde Cloud as a Baseten or RunPod alternative, the integration takes one afternoon. You don't learn a new platform. You add one endpoint.
The endpoint speaks OpenAI. Your existing client libraries work as-is. Change one URL. Nothing else changes in your code.
Assign a model to your app from the Onde dashboard. Update it any time without touching code or redeploying. Model changes take effect immediately.
Your account is your API key. The same app you registered for on-device inference works here. No new signup. No new billing page.
Hybrid LLM inference architecture
Most requests never need a server. Onde on-device handles them at ~85ms, for free, with full privacy — data never leaves the device. Onde Cloud is for the cases where the on-device model isn't the right call: background jobs, heavy prompts, server-rendered features, or platforms where you can't ship a model bundle.
The same model family. The same Onde account. Swap the endpoint, not the mental model.
API authentication
Register your app in the Onde dashboard. You get an app ID and a secret. Pass them as your Bearer token. That's the full auth setup — no token refresh, no OAuth flow, no rotating secrets page.
# Verify your credentials and confirm the endpoint is live
curl https://cloud.ondeinference.com/v1/chat/completions \
-H "Authorization: Bearer your-app-id:your-app-secret" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder-3b",
"messages": [{ "role": "user", "content": "Hello" }],
"stream": false
}'Supported GGUF models
Onde Cloud serves GGUF-quantized models from the Onde model catalog. Every model runs in Q4_K_M format, balancing quality and memory. Assign a model to your Onde app from the dashboard — no redeployment, no environment variable change, no support ticket.
The same models available in the Onde Swift SDK, Dart SDK, and Onde CLI. If it runs locally, it runs here.
Common questions
Get started
Sign in, open your app, assign a model, copy your credentials. Five minutes to a working LLM inference endpoint. Read The Forward Pass if you want the longer argument for why on-device and cloud should work together.