✨ Latest Release

Universal Model Inference
is Here!

MLOS now supports universal ONNX conversion and multi-type tensor inference across all major ML repositories

🚀 Major Enhancements

Axon: Universal Conversion

Multi-framework ONNX conversion with repository-specific strategies

  • Hugging Face (GPT-2, BERT, T5)
  • PyTorch Hub (ResNet, VGG)
  • TensorFlow (SavedModel, Keras)
  • ModelScope (Multimodal)
  • Auto-optimization
# One command, any model! axon install hf/gpt2@latest axon install pytorch/resnet50@latest axon install tfhub/bert@latest

🧠 MLOS Core: Multi-Type Tensors

Enhanced ONNX plugin with comprehensive tensor type support

  • int64 for NLP token IDs
  • float32 for vision models
  • int32 for TensorFlow
  • bool for attention masks
  • Multi-input models (BERT)
  • Named inputs parsing
# Multi-input inference curl -X POST /models/bert/inference \ -d '{"input_ids": [101, 7592, 102], "attention_mask": [1, 1, 1]}'

🔗 Seamless Integration

Complete E2E workflow from any repository to kernel-level inference

  • Zero API changes
  • Backward compatible
  • ~2-8ms inference
  • Dynamic shapes
  • Automatic type detection
  • Multi-strategy fallbacks
# Complete workflow axon install hf/gpt2@latest axon register hf/gpt2@latest # Ready for inference!

📐 End-to-End Architecture

MLOS E2E Architecture

📊 By the Numbers

4
Data Types
Supported
4
Model
Repositories
~2-8ms
Inference
Time
100K+
Models
Available
0
API
Changes
100%
Backward
Compatible

🔬 Technical Highlights

Repository-Specific Conversion Strategies

Axon now intelligently routes models to the best converter for their source repository:

Hugging Face: optimum → torch.onnx.export → transformers PyTorch Hub: TorchScript → torch → torchvision TensorFlow Hub: SavedModel → Keras H5 → tf2onnx ModelScope: Auto-detect → Framework-specific

Enhanced Tensor Parsing

MLOS Core plugin now parses JSON inputs with full type support:

// Single input (GPT-2) {"input_ids": [15496, 11, 337, 43, 48, 2640, 0]} // Multi-input (BERT) { "input_ids": [101, 7592, 1010, 1045, 2572, 102], "attention_mask": [1, 1, 1, 1, 1, 1], "token_type_ids": [0, 0, 0, 0, 0, 0] }

Zero-Cost Abstraction

All enhancements leverage the existing generic void* API - proving the architecture was designed right from the start. No breaking changes, just more capabilities!

✅ Tested & Verified

🤗 NLP Models

  • GPT-2 (DistilGPT-2)
  • BERT (base-uncased)
  • RoBERTa
  • T5

Status: ✅ Passing

🔥 Vision Models

  • ResNet (50, 101, 152)
  • VGG (16, 19)
  • AlexNet
  • ViT (coming soon)

Status: ⏳ Ready (not tested)

🎨 Multi-Modal

  • CLIP (text + image)
  • Wav2Vec2 (audio)
  • ModelScope models

Status: ⏳ Ready (not tested)

Ready to Try MLOS?

Start running models from any repository with kernel-level performance today

View on GitHub →