Question 1

How fast is on-device AI inference on Apple Silicon?

Accepted Answer

Onde Inference runs at 85ms first-token latency on Apple Silicon devices, compared to 800ms or more for a typical cloud API round-trip. Inference runs entirely on-device with no network request in the critical path.

Question 2

Does data leave the device when using Onde Inference?

Accepted Answer

No. When using the on-device SDK, all inference runs in-process on the device. No prompt, token, or result is sent over the network. The only network activity is the initial one-time model download from HuggingFace Hub.

Question 3

Which programming languages does Onde Inference support?

Accepted Answer

Onde Inference ships first-class SDKs for Swift (iOS, macOS, tvOS, visionOS, watchOS), Rust (macOS, Linux, Windows), Flutter/Dart (iOS, Android, macOS), and React Native (iOS, Android). All four share the same underlying GGUF runtime and OndeChatEngine API surface.

Question 4

What is Onde Cloud?

Accepted Answer

Onde Cloud is an OpenAI-compatible inference API at cloud.ondeinference.com. It runs the same GGUF models as the on-device SDK. When a device hits its memory or compute limit, apps can route requests to Onde Cloud with a single line of code change. Auth uses a Bearer token scoped per app.

Instant local inference.
Infinite cloud scale.

Zero latency.
Zero cost margin.

Seamless fallback.
Enterprise state.

Write once.
Deploy everywhere.

The world's intelligence.
On your terms.

Instant local inference.Infinite cloud scale.

Zero latency.Zero cost margin.

Seamless fallback.Enterprise state.

Write once.Deploy everywhere.

The world's intelligence.On your terms.

Instant local inference.
Infinite cloud scale.

Zero latency.
Zero cost margin.

Seamless fallback.
Enterprise state.

Write once.
Deploy everywhere.

The world's intelligence.
On your terms.