Onde Inference

Local first.
Cloud orchestrated.

Cloud-only inference is a margin crisis dressed up as a service. Every prompt that leaves the device pays a round-trip tax in latency, cost, and privacy exposure. Apple Silicon unified memory changed the hardware equation. The software hasn't caught up yet.

Onde Inference is the runtime that closes that gap. Native Rust and Swift, running GGUF models directly on the Neural Engine and GPU. No Python wrapper. No containerized cloud detour. When the device can't handle the load, Onde Cloud picks it up — same models, same API, zero migration cost.

Built and operated by Splitfire AB in Sweden.

// Engineering principles

How we build.

01

Execution speed

Microseconds matter when the model runs on the same chip as the UI. We write Rust and Swift because the cost of abstraction shows up in latency numbers.

02

Vertical integration

We own the execution layer from device runtime to cloud fabric. No third-party inference layer sits between your app and the model weights.

03

Developer first

One engine. One API surface. Swift, Rust, Flutter, React Native — the same OndeChatEngine behind every entry point, with no fake platform story layered on top.

// Pedigree

Splitfire AB.

Onde Inference is a product line from Splitfire AB, a Swedish software company publishing native apps on Apple platforms. The same team that ships apps through the App Store builds and operates the inference infrastructure running inside them.

Entity

Splitfire AB

Country

Sweden

Focus

Apple-silicon-first AI

Published

Klepon · Onde Inference

// Infrastructure

Quiet plumbing.

We use three external services. Everything else is owned code.

smbCloud AuthAccounts

Account creation, sign-in, email confirmation, and profile state for all Onde account surfaces.

smbCloud GresIQApp + model management

App registration, model assignment, and operational inventory for managed Onde workflows.

StripeBilling

Payments and billing events for paid plans. Not used as a general customer database.

// Open call

If you think inference belongs on the device,
you should be building this with us.

Build with us