[NEW]Get started with cloud fallback today
Get startedCactus Blog
Deep dives into on-device AI, inference optimization, and the engineering behind Cactus.
Gemma 4 on Cactus: The first model you can talk to, show things, and trust to know when it needs help
Gemma 4 runs natively on your device with real-time voice, vision, and audio, and routes hard problems to the cloud when it should.
Henry Ndubuaku
LFM-2.5-350m on Cactus: 140 tok/sec, Single Core, 355 MB
Benchmarking Liquid's LFM-2.5-350m across seven devices with Cactus. INT8 quantization, single-core CPU decode, zero-copy loading, and why this configuration makes on-device inference practical.
Sub-150ms Transcription with Cloud-Level Accuracy: Why We Built a Hybrid Engine
How Cactus combines on-device and cloud inference for real-time speech transcription with sub-150ms latency and automatic cloud handoff for noisy audio.
Ridiculously Fast On-Device Transcription: Reviewing Parakeet CTC 1.1B with Cactus
Review of NVIDIA's Parakeet-CTC-1.1B model running locally on Mac with Cactus. Architecture breakdown, benchmarks, and transcription use cases.
The Sweet Spot for Mac Code Use: Reviewing LFM2 24B MoE A2B with Cactus
Review of LiquidAI's LFM2-24B-A2B mixture-of-experts model running locally on Mac with Cactus. Architecture breakdown, benchmarks, and coding agent use cases.
