MLOS Ecosystem - MLOS Foundation

Axon - Universal Model Installer

Axon is a universal CLI tool with a pluggable adapter architecture that enables installation from any model repository. No vendor lock-in, no configuration needed.

🔄 Plug-and-Play Repository Adapters

Axon automatically detects and uses the right adapter for each model. No configuration needed - just install from any supported repository!

✓ Hugging Face Hub Available
100,000+ models • 60%+ of ML practitioners
✓ PyTorch Hub v1.1.0+
PyTorch pre-trained models • 5%+ coverage • Research focus
✓ ModelScope v1.4.0+
5,000+ models • 8%+ coverage • Multimodal & enterprise
✓ TensorFlow Hub v1.2.0+
1,000+ models • 7%+ coverage • Production deployments

Total Coverage: 80%+ of ML model user base

Install from Any Repository

# Hugging Face (available now)
axon install hf/bert-base-uncased@latest

# PyTorch Hub (v1.1.0+)
axon install pytorch/vision/resnet50@latest

# TensorFlow Hub (v1.2.0+)
axon install tfhub/google/imagenet/resnet_v2_50/classification/5@latest
axon install tfhub/google/universal-sentence-encoder/4@latest

# ModelScope (v1.4.0+)
axon install modelscope/damo/cv_resnet50_image-classification@latest
axon install modelscope/ai/modelscope_damo-text-to-video-synthesis@latest

What is Axon?

Axon provides model lifecycle management, distribution, versioning, and deployment capabilities. It follows a neural metaphor where models are neurons, Axon CLI is the transmission pathway, and kernel optimizations are the myelin sheath that speeds up signal propagation.

Key Features

Universal Installer - Works with any model repository via pluggable adapters
Real-time Downloads - No pre-packaging needed, manifests created on-the-fly
Model Discovery - Search across all configured repositories
Version Management - Semantic versioning with latest tag support
Checksum Verification - Automatic integrity checking
Local Caching - Intelligent caching for offline access
Zero Configuration - Works out of the box with Hugging Face

🏗️ Architecture: Extensible Adapter Framework (v1.3.0+)

Axon's adapter framework provides a pluggable, extensible architecture that enables support for any model repository. The framework separates core interfaces from adapter implementations, making it easy to add new repositories without modifying core code. Available in v1.3.0+

How It Works

The adapter framework automatically:

Detects the model source based on namespace (e.g., hf/, pytorch/, tfhub/)
Selects the appropriate adapter from the registry
Validates model existence using shared validation utilities
Downloads and packages models using common helpers
Caches packages locally for future use

Framework Architecture

Example: Replicate Adapter

The Replicate adapter demonstrates how to implement a new adapter using the framework. It shows:

How to implement the RepositoryAdapter interface
API-based model access patterns (vs file-based downloads)
Using core utilities for HTTP requests, validation, and packaging
Best practices for adapter development

See the complete implementation: Replicate Adapter Example →

Complete framework documentation: Adapter Framework Guide →

Basic Usage

Quick Start Commands

# Initialize the axon pathway
axon init

# Install any model directly from Hugging Face (no setup needed!)
axon install hf/bert-base-uncased@latest
axon install hf/gpt2@latest

# Install PyTorch Hub models (v1.1.0+)
axon install pytorch/vision/resnet50@latest

# Search for models
axon search resnet
axon search "image classification"

# Get model information
axon info hf/bert-base-uncased@latest

# List installed models
axon list

# Update models
axon update vision/resnet50

# Remove models
axon uninstall vision/resnet50

🐳 Axon Converter Image

The Axon Converter Image is a Docker container that provides universal ONNX model conversion capabilities. It eliminates the need for Python dependencies on the host machine while enabling seamless conversion of models from any repository to ONNX format.

Key Features

Zero Python Installation: No Python needed on your machine - everything runs in Docker
Multi-Framework Support: Pre-installed PyTorch, TensorFlow, Transformers, ModelScope, and ONNX Runtime
Universal Conversion: Convert models from Hugging Face, PyTorch Hub, TensorFlow Hub, and ModelScope
Automatic Usage: Axon automatically uses the converter image when Docker is available
Cross-Platform: Works on macOS, Linux, and Windows (with Docker)

How It Works

When you install a model with Axon, the converter image automatically:

Downloads the model from the repository
Detects the framework (PyTorch, TensorFlow, Hugging Face, etc.)
Converts the model to ONNX format using the appropriate conversion script
Saves the ONNX file to your Axon cache

Example: Automatic Conversion

# Install a model - conversion happens automatically
axon install hf/distilgpt2@latest

# Output:
# 📦 Downloading model from Hugging Face...
# 🐳 Converting model using Docker (ghcr.io/mlos-foundation/axon-converter:latest)...
# ✅ Model converted to ONNX: model.onnx
# ✅ Model installed successfully

Integration with MLOS Core

The converter image is the bridge between Axon and MLOS Core:

1

Axon Install axon install hf/model Downloads & converts to ONNX

→

2

MLOS Core Register axon register hf/model Detects ONNX & auto-selects plugin

→

3

MLOS Core Execute POST /models/hf/model/inference ONNX Runtime executes inference

Architecture Benefits

Separation of Concerns: Axon handles conversion, MLOS Core handles execution
No Python in Production: MLOS Core never needs Python dependencies
Universal Execution: ONNX Runtime enables all converted models to run
Isolated Environment: Conversion happens in containers, avoiding conflicts

Image Details

Registry: ghcr.io/mlos-foundation/axon-converter
Tags: latest, 2.1.0, 2.1, 2
Size: ~4GB (compressed: ~1.2GB)
Platforms: linux/amd64, linux/arm64
Also Available: OCI artifacts attached to GitHub Releases

Learn More: See the complete converter image documentation for detailed usage, troubleshooting, and technical details.

Status

MVP Complete - Axon has achieved MVP status with core functionality implemented:

✅ Manifest system with YAML parser and validator
✅ Manifest-first architecture with format-agnostic execution
✅ Automatic execution format detection (ONNX, PyTorch, TensorFlow)
✅ Cache manager for local model storage
✅ Registry client for model discovery
✅ CLI commands for model lifecycle management
✅ Comprehensive testing and CI/CD pipeline

View Axon on GitHub →

MLOS Core

MLOS Core is the kernel-level machine learning runtime that provides the foundation for high-performance ML inference. It implements a plugin-based architecture supporting multiple ML frameworks through the Standard Model Interface (SMI).

🔗 Axon Integration

MLOS Core integrates seamlessly with Axon for complete end-to-end model delivery and execution:

Model Package Format (MPF): MLOS Core relies on Axon packages as specified in patent US-63/861,527
Standardized Metadata: Reads Axon manifests for model information (framework, resources, I/O schema, execution format)
Format-Agnostic Execution: Dynamic plugin selection based on manifest's execution format
Universal Delivery: Works with models from any repository (Hugging Face, PyTorch Hub, TensorFlow Hub, ModelScope) via Axon
Plugin Independence: Plugins receive path to model files, don't need to know about Axon

Complete E2E Workflow:

End-to-End Integration Example

# 1. Install model with Axon
axon install hf/bert-base-uncased@latest

# 2. Register with MLOS Core
axon register hf/bert-base-uncased@latest

# 3. Run inference via MLOS Core API
curl -X POST http://localhost:8080/models/hf/bert-base-uncased@latest/inference \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello, MLOS!", "batch_size": 1}'

Architecture Details:

Delivery Layer (Axon): Handles all repository interactions, creates standardized `.axon` packages with `manifest.yaml`, provides metadata. Does NOT execute models.
Execution Layer (MLOS Core): Reads Axon manifests for metadata, manages model lifecycle, routes to appropriate plugins. Does NOT access repositories directly.
Framework Layer (Plugins): Load framework-specific model formats, execute inference. Plugins receive path to model files, don't need to know about Axon.

Patent Alignment: This integration demonstrates key innovations from MLOS Foundation patents - US-63/861,527 (MLOS System with native lifecycle management) and US-63/865,176 (Kernel-level optimizations for ML workloads). Axon packages ARE the Model Package Format (MPF) implementation.

Architecture

MLOS Core consists of several key components:

Core Engine: Plugin registry, model registry, and resource manager
API Layer: Multi-protocol support (HTTP, gRPC, IPC)
SMI Interface: Standardized interface for ML framework plugins
Resource Management: Intelligent allocation and sharing of compute resources
Axon Manifest Reader: Reads standardized Axon packages (MPF) for model metadata

Performance

Kernel-level optimizations enable ultra-low latency inference:

Sub-millisecond inference for small models (IPC API)
Zero-copy operations for large tensors
Efficient GPU memory and compute sharing
Automatic batching for throughput optimization

View Architecture Details →

MLOS Linux Distributions

Complete Linux distributions optimized for machine learning workloads, with MLOS Core, Axon, and kernel-level ML optimizations pre-installed. Currently in planning phase (target: v1.0.0 - Q2-Q3 2026).

Distribution Variants

MLOS Linux (Ubuntu): Based on Ubuntu 22.04/24.04 LTS with .deb packages, traditional installer, and cloud images
MLOS Linux (Flatcar): Based on Flatcar Linux with container-first architecture, Ignition-based provisioning, and immutable filesystem
Shared Kernel Patches: ML-aware scheduler, tensor memory management, and GPU orchestration (US-63/865,176)

Key Features (Planned)

ML-Aware Kernel: Priority-based ML task scheduling with tensor operation awareness
Pre-Installed Stack: MLOS Core, Axon, and ML development toolchain out of the box
Standards Compliant: LSB, FHS, systemd, Debian Policy (Ubuntu) / Ignition (Flatcar)
IP Protection: Binary artifacts from private repositories with GPG signature verification
Universal Model Support: Works with models from Hugging Face, PyTorch Hub, TensorFlow Hub, ModelScope via Axon

Architecture

Both distributions include:

Kernel Modifications: ML-aware scheduler, tensor memory management, GPU resource orchestration
MLOS Core: Kernel-level ML runtime with HTTP, gRPC, and IPC APIs
Axon: Universal model installer with 80%+ repository coverage
ML Toolchain: Python, PyTorch, TensorFlow, ONNX Runtime pre-installed

Status & Repositories

Status: 🔄 Planning Phase (Repository structure created, build infrastructure pending)

MLOS Linux (Ubuntu) - Ubuntu-based distribution
MLOS Linux (Flatcar) - Flatcar-based distribution
MLOS Kernel Patches - Shared kernel patches

Note: Distribution repositories are public and contain build scripts, configuration, and documentation. MLOS Core binaries and kernel patches are downloaded from private registries during build time with GPG signature verification. See IP Protection Plan for details.

Standard Model Interface (SMI)

The Standard Model Interface specification provides a unified interface for ML frameworks, enabling framework-agnostic model deployment and management.

Benefits

Framework Interoperability: Deploy models from any supported framework
Plugin System: Hot-swappable framework plugins
Resource Declaration: Declarative resource requirements
Multi-language Support: C, Python, Go, JavaScript bindings

Supported Frameworks

SMI enables plugins for:

PyTorch
TensorFlow
ONNX Runtime
Custom frameworks (via plugin API)

Future Components

The MLOS ecosystem continues to evolve with additional components planned:

MLOS Kernel: Kernel-level ML runtime optimizations
MLOS Scheduler: ML workload scheduling and orchestration
Axon Registry: Model registry and discovery service
Axon Hub: Web UI for model discovery and management
Axon SDK: Python SDK for model packaging