Documentation

Everything in one place.

Quick-start guides, API reference, model table, and FAQ for every Onde Inference entry point.

01 / Quick start

Pick your language.

Every SDK ships the same OndeChatEngine surface. Load a model, send a message, read the result. The runtime is native on every platform — no server required.

02 / SDKs

Four first-class entry points.

One engine. All four SDKs share the same GGUF runtime, the same model cache, and the same API surface. Pick the one that matches your stack.

03 / Cloud API

OpenAI-compatible. No migration cost.

Onde Cloud runs at cloud.ondeinference.com. Auth is a single Bearer token. Any client that already uses the OpenAI API works without modification.

01

Sign in and create an app

Create an account, register an app workspace, and copy your app_id and app_secret from the console.

02

Make your first request

The endpoint is OpenAI-compatible. Any client that already speaks the OpenAI API works without modification.

curl https://cloud.ondeinference.com/v1/chat/completions \
  -H "Authorization: Bearer <app_id>:<app_secret>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder-3b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
03

Use the OpenAI SDK

Point the base URL at Onde Cloud. Everything else stays the same.

from openai import OpenAI

client = OpenAI(
    base_url="https://cloud.ondeinference.com/v1",
    api_key="<app_id>:<app_secret>",
)

response = client.chat.completions.create(
    model="qwen2.5-coder-3b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Base URLhttps://cloud.ondeinference.com/v1
Auth headerAuthorization: Bearer <app_id>:<app_secret>
Health checkGET /health → 200 OK
Models listGET /v1/models → 200 OK (authenticated)

Full cloud reference and pricing → ondeinference.com/cloud

04 / CLI

Install once. Use everywhere.

The onde binary ships on every major package manager. It handles account management, model downloads, local fine-tuning, and GGUF export.

npm
npm install -g @ondeinference/cli
Homebrew
brew tap ondeinference/homebrew-tap && brew install onde
PyPI
pip install onde-cli
cargo
cargo install onde-cli

All install options, workflow steps, and fine-tuning guide → ondeinference.com/cli

05 / Models

Supported GGUF models.

All models are Q4_K_M quantized GGUF files sourced from bartowski on HuggingFace. platform_default() selects automatically based on your target OS.

Model IDSizeTarget
qwen2.5-coder-1.5b941 MBMobile · iOS · Android
qwen2.5-coder-3b1.93 GBDesktop · macOS · Linux
qwen2.5-coder-7b4.4 GBHigh-memory devices
qwen3-1.7b1.3 GBAll platforms
qwen3-4b2.7 GBAll platforms
qwen3-8b5 GBHigh-memory devices

Download, cache management, and repair commands → Onde CLI

06 / FAQ

Common questions.

Which model loads by default?
platform_default() picks Qwen 2.5 Coder 1.5B on iOS, tvOS, and Android, and Qwen 2.5 Coder 3B on macOS, Linux, and Windows. You can override it with any GgufModelConfig constructor.
Where are models cached?
On macOS the HuggingFace hub cache lives at ~/.cache/huggingface. On iOS and tvOS the sandbox requires you to call setupInferenceEnvironment() at launch to seed HF_HOME inside the app container before any OndeChatEngine call.
Does Onde Cloud use a different model than on-device?
No. The same GGUF models run on both. Your dashboard assignment determines which model the cloud endpoint serves. The request model field is accepted for compatibility but dashboard assignment takes precedence.
Can I stream tokens from the cloud API?
Yes. Add "stream": true to your request body. The endpoint returns Server-Sent Events using the same format as the OpenAI streaming API.
Does data leave the device with the on-device SDK?
No. When you use the on-device SDK, inference runs entirely in-process. No prompt, token, or result is transmitted over the network. The only network activity is the initial model download from HuggingFace Hub.
What HuggingFace token do I need?
All current default models (Qwen 2.5 and Qwen 3 family) are public and do not require a token. A token is only needed if you want to download gated models or upload custom fine-tunes.