Backed by

On-device AI
with cloud fallback

Deploy speech, vision, and text models with a single toolkit.

Cactus automatically routes audio between on-device for clear audio and cloud for noisy data.

Voice

Cactus Hybrid Router

On-Device

Cloud

Latency

120ms

Transcription

Cactus

Routing to On-Device

Auto-optimizing for accuracy & cost

Try the demo

$brew install cactus-compute/cactus/cactus

$cactus transcribe

Cactus routes agent commands based on complexity: on-device for simple tasks, cloud for complex operations.

Command

Set the thermostat to 72 degrees

Cactus Hybrid Router

On-Device

Cloud

Complexity

—

Output

Waiting for command...

Intelligent routing for function calls

Try the demo

$brew install cactus-compute/cactus/cactus

$cactus run

4.2k+ stars Start for free Read the docs

Cost savings

<120ms

Latency on-device

<6%

WER transcription

API

Built by a team from

Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT

Powered by the Cactus Engine.
The fastest on-device runtime.

Open Source

Fully auditable and community-driven. Inspect every line that runs on your users' devices.

$git clone git@github.com/cactus-compute/cactus

$source ./setup

$cactus build

$cactus run LiquidAI/LFM2-2.6B

View on GitHub

Optimized Execution

Quantized models with hardware-specific acceleration. Tuned for battery-efficient inference.

Zero-copy Memory Mapping

Minimal RAM usage and near-instant model loading with zero-copy memory mapping.

Cross-Platform

iOS, Android, macOS, and wearables from a single SDK. Write once, deploy anywhere.

Cactus Hybrid Cloud
Cloud accuracy. Without the cloud cost.

Cactus only hands off the complex requests to the cloud, running simple tasks on-device.

#include <cactus.h>

setenv("CACTUS_CLOUD_API_KEY", "your-api-key", 1); // optional hybrid cloud key

cactus_model_t model = cactus_init("path/to/weights");

char response[4096];

cactus_complete(model, messages, response, sizeof(response), nullptr, nullptr, callback);

Cost Savings

Over 80% of production transcription and LLM inference can be handled on-device.

<120ms

On-Device Latency

Real-time transcription. No round-trip to the cloud for clear audio.

Native

Optimized for every platform

We built Cactus as an on-device engine first. Optimized for the fastest inference on smartphones, laptops, and wearables.

Automatic Handoff

Cactus monitors audio quality in real-time. When conditions change, we seamlessly switch between on-device and cloud inference. Your app doesn't need to know the difference.

Privacy When You Need It

For sensitive applications, lock transcription to on-device only. Audio data never leaves the user's phone. HIPAA-friendly, GDPR-compliant, zero data retention.

No compromise

Get the best of both on-device and cloud.

Traditional Cloud AI

Cactus On-Device

Cactus Hybrid

Sub 150ms Latency

Handles Noisy Audio

Works Offline

Data Privacy

Cost Efficient

Smart Routing

Built for the edge

From phones to glasses, Cactus runs wherever your users are.

Mobile Voice Assistant

Real-time voice commands and dictation for iOS and Android apps with sub-150ms latency.

Desktop Notetaker

Meeting transcription and note-taking for macOS with automatic speaker detection.

Wearable Intelligence

Always-on transcription for smart glasses and AR devices with minimal battery impact.

Ready to get started?

Add transcription to your app in minutes. Free to start, scales with you.

Create free account Talk to us

On-device AIwith cloud fallback

Powered by the Cactus Engine.The fastest on-device runtime.