Documentation
Quick-start guides, API reference, model table, and FAQ for every Onde Inference entry point.
01 / Quick start
Every SDK ships the same OndeChatEngine surface. Load a model, send a message, read the result. The runtime is native on every platform — no server required.
import Onde
let engine = OndeChatEngine()
try await engine.loadDefaultModel(
systemPrompt: "You are helpful.",
sampling: nil
)
let result = try await engine.sendMessage(
message: "Hello!"
)
print(result.text) // 85ms, on deviceView full docs →use onde::inference::{ChatEngine, GgufModelConfig};
let engine = ChatEngine::new();
engine.load_gguf_model(
GgufModelConfig::platform_default(),
Some("You are helpful.".into()),
None,
).await?;
let result = engine.send_message("Hello!").await?;
println!("{}", result.text); // 85ms, on deviceView full docs →import 'package:onde_inference/onde_inference.dart';
final engine = OndeChatEngine();
await engine.loadDefaultModel(
systemPrompt: 'You are helpful.',
);
final result = await engine.sendMessage('Hello!');
print(result.text); // 85ms, on deviceView full docs →import { OndeChatEngine } from '@ondeinference/react-native';
const engine = new OndeChatEngine();
await engine.loadDefaultModel({
systemPrompt: 'You are helpful.',
});
const result = await engine.sendMessage({ message: 'Hello!' });
console.log(result.text); // 85ms, on deviceView full docs →02 / SDKs
One engine. All four SDKs share the same GGUF runtime, the same model cache, and the same API surface. Pick the one that matches your stack.
Binary package for all Apple platforms. Add the GitHub URL in Xcode, done.
The core engine. Use it directly in Rust apps or as the foundation for custom integrations.
Cross-platform Dart bindings. Works on iOS, Android, and macOS from a single import.
Expo module wrapping the Rust core. iOS and Android from one JavaScript import.
03 / Cloud API
Onde Cloud runs at cloud.ondeinference.com. Auth is a single Bearer token. Any client that already uses the OpenAI API works without modification.
curl https://cloud.ondeinference.com/v1/chat/completions \
-H "Authorization: Bearer <app_id>:<app_secret>" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder-3b",
"messages": [{"role": "user", "content": "Hello!"}]
}'from openai import OpenAI
client = OpenAI(
base_url="https://cloud.ondeinference.com/v1",
api_key="<app_id>:<app_secret>",
)
response = client.chat.completions.create(
model="qwen2.5-coder-3b",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Full cloud reference and pricing → ondeinference.com/cloud
04 / CLI
The onde binary ships on every major package manager. It handles account management, model downloads, local fine-tuning, and GGUF export.
npm install -g @ondeinference/clibrew tap ondeinference/homebrew-tap && brew install ondepip install onde-clicargo install onde-cliAll install options, workflow steps, and fine-tuning guide → ondeinference.com/cli
05 / Models
All models are Q4_K_M quantized GGUF files sourced from bartowski on HuggingFace. platform_default() selects automatically based on your target OS.
qwen2.5-coder-1.5b941 MBMobile · iOS · Androidqwen2.5-coder-3b1.93 GBDesktop · macOS · Linuxqwen2.5-coder-7b4.4 GBHigh-memory devicesqwen3-1.7b1.3 GBAll platformsqwen3-4b2.7 GBAll platformsqwen3-8b5 GBHigh-memory devicesDownload, cache management, and repair commands → Onde CLI
06 / FAQ