Backed byY Combinator

On-device AI
with cloud fallback

Deploy speech, vision, and text models with a single toolkit.

Cactus automatically routes audio between on-device for clear audio and cloud for noisy data.

Voice
Cactus Hybrid Router
On-Device
Cloud
Latency
120ms
Transcription

Cactus

Routing to On-Device
Auto-optimizing for accuracy & cost
Try the demo
$brew install cactus-compute/cactus/cactus
$cactus transcribe

Cactus routes agent commands based on complexity: on-device for simple tasks, cloud for complex operations.

Command

Set the thermostat to 72 degrees

Cactus Hybrid Router
On-Device
Cloud
Complexity
Output
Waiting for command...
Intelligent routing for function calls
Try the demo
$brew install cactus-compute/cactus/cactus
$cactus run
5x
Cost savings
<120ms
Latency on-device
<6%
WER transcription
1
API

Built by a team from

Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT
Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT
Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT
Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT

Powered by the Cactus Engine.
The fastest on-device runtime.

Open Source

Fully auditable and community-driven. Inspect every line that runs on your users' devices.

$git clone git@github.com/cactus-compute/cactus
$source ./setup
$cactus build
$cactus run LiquidAI/LFM2-2.6B
View on GitHub

Optimized Execution

Quantized models with hardware-specific acceleration. Tuned for battery-efficient inference.

Zero-copy Memory Mapping

Minimal RAM usage and near-instant model loading with zero-copy memory mapping.

Cross-Platform

iOS, Android, macOS, and wearables from a single SDK. Write once, deploy anywhere.

Cactus Hybrid Cloud
Cloud accuracy. Without the cloud cost.

Cactus only hands off the complex requests to the cloud, running simple tasks on-device.

#include <cactus.h>
setenv("CACTUS_CLOUD_API_KEY", "your-api-key", 1); // optional hybrid cloud key
cactus_model_t model = cactus_init("path/to/weights");
char response[4096];
cactus_complete(model, messages, response, sizeof(response), nullptr, nullptr, callback);
5x
Cost Savings

Over 80% of production transcription and LLM inference can be handled on-device.

<120ms
On-Device Latency

Real-time transcription. No round-trip to the cloud for clear audio.

Native
Optimized for every platform

We built Cactus as an on-device engine first. Optimized for the fastest inference on smartphones, laptops, and wearables.

Automatic Handoff

Cactus monitors audio quality in real-time. When conditions change, we seamlessly switch between on-device and cloud inference. Your app doesn't need to know the difference.

Privacy When You Need It

For sensitive applications, lock transcription to on-device only. Audio data never leaves the user's phone. HIPAA-friendly, GDPR-compliant, zero data retention.

No compromise

Get the best of both on-device and cloud.

Traditional Cloud AI
Cactus On-Device
Cactus Hybrid
Sub 150ms Latency
Handles Noisy Audio
Works Offline
Data Privacy
Cost Efficient
Smart Routing

Built for the edge

From phones to glasses, Cactus runs wherever your users are.

Mobile Voice Assistant

Real-time voice commands and dictation for iOS and Android apps with sub-150ms latency.

Desktop Notetaker

Meeting transcription and note-taking for macOS with automatic speaker detection.

Wearable Intelligence

Always-on transcription for smart glasses and AR devices with minimal battery impact.

Ready to get started?

Add transcription to your app in minutes. Free to start, scales with you.