Supported Formats
| Format | Extensions | Runtime | Status |
|---|---|---|---|
| ONNX | .onnx |
ONNX Runtime (SMI) | Built-in |
| GGUF | .gguf |
llama.cpp | Enabled |
| PyTorch | .pt .pth .bin .safetensors |
Auto-converted to ONNX | Plugin Ready |
| TFLite | .tflite |
TensorFlow Lite | Plugin Ready |
| CoreML | .mlmodel .mlpackage |
Apple CoreML | Plugin Ready |
NLP Models
7 modelsText processing, embeddings, and encoder-decoder transformers.
DistilGPT-2 - Lightweight text generation
BERT base - Masked language model
RoBERTa base - Robust BERT variant
T5 small - Text-to-text transformer (Seq2Seq)
Smaller, faster BERT variant (40% smaller)
Parameter-efficient BERT variant
Text embeddings for semantic search
Vision Models
6 modelsImage classification with CNNs and Vision Transformers.
Classic CNN - Image classification (1000 classes)
Vision Transformer - Patch-based attention
Modern CNN architecture (Facebook AI)
Efficient mobile-optimized architecture
Data-efficient Image Transformer
Compound scaling CNN (Google)
Multimodal Models
1 modelImage-text models with multi-encoder architecture.
Image-text matching, zero-shot classification (OpenAI)
LLM Models (GGUF)
7 modelsLarge Language Models with native llama.cpp execution. All models use Q4_K_M quantization.
Ultra-small instruction-tuned model (Alibaba)
Small but capable chat model
Meta's latest small model optimized for mobile
Best quality/size ratio from Meta
Code generation specialist
High-quality open chat model
Microsoft small language model