Supported Models

Complete list of ML models tested and validated with MLOS Core. From NLP transformers to LLMs.

24
Total Models
18
CI Enabled
4
Categories
2
Formats

Supported Formats

Format Extensions Runtime Status
ONNX .onnx ONNX Runtime (SMI) Built-in
GGUF .gguf llama.cpp Enabled
PyTorch .pt .pth .bin .safetensors Auto-converted to ONNX Plugin Ready
TFLite .tflite TensorFlow Lite Plugin Ready
CoreML .mlmodel .mlpackage Apple CoreML Plugin Ready
NLP

NLP Models

7 models

Text processing, embeddings, and encoder-decoder transformers.

GPT-2 ONNX Enabled

DistilGPT-2 - Lightweight text generation

Axon ID
hf/distilgpt2
Input
Text tokens
BERT ONNX Enabled

BERT base - Masked language model

Axon ID
hf/bert-base-uncased
Input
Text tokens
RoBERTa ONNX Enabled

RoBERTa base - Robust BERT variant

Axon ID
hf/roberta-base
Input
Text tokens
T5 ONNX Enabled

T5 small - Text-to-text transformer (Seq2Seq)

Architecture
Encoder-Decoder
Input
Text tokens
DistilBERT ONNX Enabled

Smaller, faster BERT variant (40% smaller)

Axon ID
hf/distilbert-base
Input
Text tokens
ALBERT ONNX Enabled

Parameter-efficient BERT variant

Axon ID
hf/albert-base-v2
Input
Text tokens
Sentence-BERT ONNX Enabled

Text embeddings for semantic search

Output
384-dim vectors
Input
Text tokens
Vision

Vision Models

6 models

Image classification with CNNs and Vision Transformers.

ResNet-50 ONNX Enabled

Classic CNN - Image classification (1000 classes)

Input Size
224x224x3
Output
1000 classes
ViT ONNX Enabled

Vision Transformer - Patch-based attention

Input Size
224x224x3
Patch Size
16x16
ConvNeXt ONNX Enabled

Modern CNN architecture (Facebook AI)

Input Size
224x224x3
Variant
Tiny
MobileNet ONNX Enabled

Efficient mobile-optimized architecture

Input Size
224x224x3
Variant
V2 1.0
DeiT ONNX Enabled

Data-efficient Image Transformer

Input Size
224x224x3
Variant
Small
EfficientNet ONNX Enabled

Compound scaling CNN (Google)

Input Size
224x224x3
Variant
B0
Multimodal

Multimodal Models

1 model

Image-text models with multi-encoder architecture.

CLIP ONNX Enabled

Image-text matching, zero-shot classification (OpenAI)

Architecture
Multi-encoder
Input
Text + Image
LLM

LLM Models (GGUF)

7 models

Large Language Models with native llama.cpp execution. All models use Q4_K_M quantization.

Qwen2-0.5B GGUF Enabled

Ultra-small instruction-tuned model (Alibaba)

Parameters
0.5B
Size
~380MB
Context
32K tokens
Quantization
Q4_K_M
TinyLlama-1.1B GGUF Enabled

Small but capable chat model

Parameters
1.1B
Size
~637MB
Context
2K tokens
Quantization
Q4_K_M
Llama-3.2-1B GGUF Enabled

Meta's latest small model optimized for mobile

Parameters
1B
Size
~700MB
Context
128K tokens
Quantization
Q4_K_M
Llama-3.2-3B GGUF Local Only

Best quality/size ratio from Meta

Parameters
3B
Size
~1.8GB
Context
128K tokens
Quantization
Q4_K_M
DeepSeek-Coder-1.3B GGUF Enabled

Code generation specialist

Parameters
1.3B
Size
~750MB
Context
16K tokens
Quantization
Q4_K_M
DeepSeek-LLM-7B GGUF Local Only

High-quality open chat model

Parameters
7B
Size
~4GB
Context
4K tokens
Quantization
Q4_K_M
Phi-2 GGUF Local Only

Microsoft small language model

Parameters
2.7B
Size
~1.6GB
Context
2K tokens
Quantization
Q4_K_M