Supported Models - MLOS Foundation

Total Models

CI Enabled

Supported Formats

Format	Extensions	Runtime	Status
ONNX	`.onnx`	ONNX Runtime (SMI)	Built-in
GGUF	`.gguf`	llama.cpp	Enabled
PyTorch	`.pt` `.pth` `.bin` `.safetensors`	Auto-converted to ONNX	Plugin Ready
TFLite	`.tflite`	TensorFlow Lite	Plugin Ready
CoreML	`.mlmodel` `.mlpackage`	Apple CoreML	Plugin Ready

NLP

NLP Models

7 models

Text processing, embeddings, and encoder-decoder transformers.

GPT-2 ONNX Enabled

DistilGPT-2 - Lightweight text generation

Axon ID

hf/distilgpt2

Input

Text tokens

BERT ONNX Enabled

BERT base - Masked language model

Axon ID

hf/bert-base-uncased

Input

Text tokens

RoBERTa ONNX Enabled

RoBERTa base - Robust BERT variant

Axon ID

hf/roberta-base

Input

Text tokens

T5 ONNX Enabled

T5 small - Text-to-text transformer (Seq2Seq)

Architecture

Encoder-Decoder

Input

Text tokens

DistilBERT ONNX Enabled

Smaller, faster BERT variant (40% smaller)

Axon ID

hf/distilbert-base

Input

Text tokens

ALBERT ONNX Enabled

Parameter-efficient BERT variant

Axon ID

hf/albert-base-v2

Input

Text tokens

Sentence-BERT ONNX Enabled

Text embeddings for semantic search

Output

384-dim vectors

Input

Text tokens

Vision

Vision Models

6 models

Image classification with CNNs and Vision Transformers.

ResNet-50 ONNX Enabled

Classic CNN - Image classification (1000 classes)

Input Size

224x224x3

Output

1000 classes

ViT ONNX Enabled

Vision Transformer - Patch-based attention

Input Size

224x224x3

Patch Size

16x16

ConvNeXt ONNX Enabled

Modern CNN architecture (Facebook AI)

Input Size

224x224x3

Variant

Tiny

MobileNet ONNX Enabled

Efficient mobile-optimized architecture

Input Size

224x224x3

Variant

V2 1.0

DeiT ONNX Enabled

Data-efficient Image Transformer

Input Size

224x224x3

Variant

Small

EfficientNet ONNX Enabled

Compound scaling CNN (Google)

Input Size

224x224x3

Variant

Multimodal

Multimodal Models

1 model

Image-text models with multi-encoder architecture.

CLIP ONNX Enabled

Image-text matching, zero-shot classification (OpenAI)

Architecture

Multi-encoder

Input

Text + Image

LLM

LLM Models (GGUF)

7 models

Large Language Models with native llama.cpp execution. All models use Q4_K_M quantization.

Qwen2-0.5B GGUF Enabled

Ultra-small instruction-tuned model (Alibaba)

Parameters

0.5B

Size

~380MB

Context

32K tokens

Quantization

Q4_K_M

TinyLlama-1.1B GGUF Enabled

Small but capable chat model

Parameters

1.1B

Size

~637MB

Context

2K tokens

Quantization

Q4_K_M

Llama-3.2-1B GGUF Enabled

Meta's latest small model optimized for mobile

Parameters

Size

~700MB

Context

128K tokens

Quantization

Q4_K_M

Llama-3.2-3B GGUF Local Only

Best quality/size ratio from Meta

Parameters

Size

~1.8GB

Context

128K tokens

Quantization

Q4_K_M

DeepSeek-Coder-1.3B GGUF Enabled

Code generation specialist

Parameters

1.3B

Size

~750MB

Context

16K tokens

Quantization

Q4_K_M

DeepSeek-LLM-7B GGUF Local Only

High-quality open chat model

Parameters

Size

~4GB

Context

4K tokens

Quantization

Q4_K_M

Phi-2 GGUF Local Only

Microsoft small language model

Parameters

2.7B

Size

~1.6GB

Context

2K tokens

Quantization

Q4_K_M