Overview
Cross-platform framework for deploying language, vision, and speech models locally on smartphones.
Cactus SDK
Cactus is the fastest cross-platform framework for deploying AI locally on smartphones.
Key Features
- Cross-Platform: Available in Flutter and React Native for cross-platform developers
- Any GGUF Model: Supports any GGUF model from Huggingface (Qwen, Gemma, Llama, DeepSeek, etc.)
- Multi-Modal AI: Run LLMs, VLMs, Embedding Models, TTS models and more
- Optimized Performance: From FP32 to as low as 2-bit quantized models for efficiency
- Agentic: More performant workflows with mobile tool calling
- Native Support: iOS xcframework and JNILibs for native setup
- Tiny C++ Build: For custom hardware deployments
- Advanced Features: Chat templates with Jinja2 support and token streaming
Quick Start
Choose your preferred platform:
Flutter
Cross-platform mobile development with Dart
Install with flutter pub add cactus
React Native
Mobile development with JavaScript/TypeScript
Install with npm install cactus-react-native
C/C++
Native development for custom hardware
Clone the repository and build locally
Platform Examples
Note: due to divergent framework support, the initialization patterns for Flutter and React Native are slightly different:
- React Native
CactusLM
is initialized with a local model file (inside the app sandbox) - Flutter
CactusLM
is initialized with a HuggingFace download URL
import 'package:cactus/cactus.dart';
final lm = await CactusLM.init(
modelUrl: 'https://huggingface.co/Cactus-Compute/Qwen3-600m-Instruct-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf',
contextSize: 2048,
);
final messages = [ChatMessage(role: 'user', content: 'Hello!')];
final response = await lm.completion(messages, maxTokens: 100, temperature: 0.7);
import { CactusLM } from 'cactus-react-native';
import RNFS from 'react-native-fs'; // install RNFS for file management
const filePath = `${RNFS.DocumentDirectoryPath}/${fileName}`;
const { lm, error } = await CactusLM.init({
model: filePath,
n_ctx: 2048,
});
const messages = [{ role: 'user', content: 'Hello!' }];
const params = { n_predict: 100, temperature: 0.7 };
const response = await lm.completion(messages, params);
common_params params;
params.model.path = 'path/to/your/model.gguf';
context.loadModel(params)
context.params.prompt = "Hello, how are you?";
context.params.n_predict = 100;
context.initSampling()
context.beginCompletion();
context.loadPrompt();
while (context.has_next_token && !context.is_interrupted) {
auto token_output = context.doCompletion();
if (token_output.tok == -1) break;
}
Get started by watching a quickstart video and building one of our example apps:
Telemetry
Cactus offers powerful telemetry for all your React Native projects.
To take advantage of your Cactus telemetry, visit our React Native documentation .
Performance Benchmarks
Real-world performance on popular mobile devices:
Device | Gemma3 1B Q4 (toks/sec) | Qwen3 4B Q4 (toks/sec) |
---|---|---|
iPhone 16 Pro Max | 54 | 18 |
iPhone 16 Pro | 54 | 18 |
iPhone 16 | 49 | 16 |
iPhone 15 Pro Max | 45 | 15 |
iPhone 15 Pro | 45 | 15 |
iPhone 14 Pro Max | 44 | 14 |
OnePlus 13 5G | 43 | 14 |
Samsung Galaxy S24 Ultra | 42 | 14 |
iPhone 15 | 42 | 14 |
OnePlus Open | 38 | 13 |
Samsung Galaxy S23 5G | 37 | 12 |
Samsung Galaxy S24 | 36 | 12 |
iPhone 13 Pro | 35 | 11 |
OnePlus 12 | 35 | 11 |
Galaxy S25 Ultra | 29 | 9 |
OnePlus 11 | 26 | 8 |
iPhone 13 mini | 25 | 8 |
Redmi K70 Ultra | 24 | 8 |
Xiaomi 13 | 24 | 8 |
Samsung Galaxy S24+ | 22 | 7 |
Samsung Galaxy Z Fold 4 | 22 | 7 |
Xiaomi Poco F6 5G | 22 | 6 |
Demo Apps
Try our demo applications to see Cactus SDK in action:
Next Steps
Join our discord!
Ask questions and engage with the community
View Recommended Models
Browse our recommended models on HuggingFace
Community
- Join our Discord - Get help and connect with other developers
- Visualize Repository - Explore the codebase structure
- GitHub Repository - View source code and contribute