Execution speed
Microseconds matter when the model runs on the same chip as the UI. We write Rust and Swift because the cost of abstraction shows up in latency numbers.
Onde Inference
Cloud-only inference is a margin crisis dressed up as a service. Every prompt that leaves the device pays a round-trip tax in latency, cost, and privacy exposure. Apple Silicon unified memory changed the hardware equation. The software hasn't caught up yet.
Onde Inference is the runtime that closes that gap. Native Rust and Swift, running GGUF models directly on the Neural Engine and GPU. No Python wrapper. No containerized cloud detour. When the device can't handle the load, Onde Cloud picks it up — same models, same API, zero migration cost.
Built and operated by Splitfire AB in Sweden.
// Engineering principles
Microseconds matter when the model runs on the same chip as the UI. We write Rust and Swift because the cost of abstraction shows up in latency numbers.
We own the execution layer from device runtime to cloud fabric. No third-party inference layer sits between your app and the model weights.
One engine. One API surface. Swift, Rust, Flutter, React Native — the same OndeChatEngine behind every entry point, with no fake platform story layered on top.
// Pedigree
Onde Inference is a product line from Splitfire AB, a Swedish software company publishing native apps on Apple platforms. The same team that ships apps through the App Store builds and operates the inference infrastructure running inside them.
Entity
Splitfire AB
Country
Sweden
Focus
Apple-silicon-first AI
Published
Klepon · Onde Inference
// Infrastructure
We use three external services. Everything else is owned code.
Account creation, sign-in, email confirmation, and profile state for all Onde account surfaces.
App registration, model assignment, and operational inventory for managed Onde workflows.
Payments and billing events for paid plans. Not used as a general customer database.
// Open call