MLOS Ecosystem

Complete toolchain for machine learning infrastructure

Axon - Universal Model Installer

Axon is a universal CLI tool with a pluggable adapter architecture that enables installation from any model repository. No vendor lock-in, no configuration needed.

🔄 Plug-and-Play Repository Adapters

Axon automatically detects and uses the right adapter for each model. No configuration needed - just install from any supported repository!

  • Hugging Face Hub Available
    100,000+ models • 60%+ of ML practitioners
  • PyTorch Hub v1.1.0+
    PyTorch pre-trained models • 5%+ coverage • Research focus
  • ModelScope v1.4.0+
    5,000+ models • 8%+ coverage • Multimodal & enterprise
  • TensorFlow Hub v1.2.0+
    1,000+ models • 7%+ coverage • Production deployments

Total Coverage: 80%+ of ML model user base

Install from Any Repository

# Hugging Face (available now)
axon install hf/bert-base-uncased@latest

# PyTorch Hub (v1.1.0+)
axon install pytorch/vision/resnet50@latest

# TensorFlow Hub (v1.2.0+)
axon install tfhub/google/imagenet/resnet_v2_50/classification/5@latest
axon install tfhub/google/universal-sentence-encoder/4@latest

# ModelScope (v1.4.0+)
axon install modelscope/damo/cv_resnet50_image-classification@latest
axon install modelscope/ai/modelscope_damo-text-to-video-synthesis@latest

What is Axon?

Axon provides model lifecycle management, distribution, versioning, and deployment capabilities. It follows a neural metaphor where models are neurons, Axon CLI is the transmission pathway, and kernel optimizations are the myelin sheath that speeds up signal propagation.

Key Features

  • Universal Installer - Works with any model repository via pluggable adapters
  • Real-time Downloads - No pre-packaging needed, manifests created on-the-fly
  • Model Discovery - Search across all configured repositories
  • Version Management - Semantic versioning with latest tag support
  • Checksum Verification - Automatic integrity checking
  • Local Caching - Intelligent caching for offline access
  • Zero Configuration - Works out of the box with Hugging Face

🏗️ Architecture: Extensible Adapter Framework (v1.3.0+)

Axon's adapter framework provides a pluggable, extensible architecture that enables support for any model repository. The framework separates core interfaces from adapter implementations, making it easy to add new repositories without modifying core code. Available in v1.3.0+

How It Works

The adapter framework automatically:

  • Detects the model source based on namespace (e.g., hf/, pytorch/, tfhub/)
  • Selects the appropriate adapter from the registry
  • Validates model existence using shared validation utilities
  • Downloads and packages models using common helpers
  • Caches packages locally for future use
Framework Architecture
Axon CLI commands.go AdapterRegistry RegisterDefaultAdapters() Local • PyTorch Hub • TensorFlow Hub • Hugging Face Builtin Adapters Hugging Face PyTorch Hub TensorFlow Hub ModelScope Local Registry Example Adapters Replicate (Reference) Custom Adapters User-defined Extensions Core Framework Utilities internal/registry/core/ HTTPClient HTTP + Auth ModelValidator Existence Check PackageBuilder .axon Packages DownloadFile Progress Tracking
Example: Replicate Adapter

The Replicate adapter demonstrates how to implement a new adapter using the framework. It shows:

  • How to implement the RepositoryAdapter interface
  • API-based model access patterns (vs file-based downloads)
  • Using core utilities for HTTP requests, validation, and packaging
  • Best practices for adapter development

See the complete implementation: Replicate Adapter Example →

Complete framework documentation: Adapter Framework Guide →

Basic Usage

Quick Start Commands

# Initialize the axon pathway
axon init

# Install any model directly from Hugging Face (no setup needed!)
axon install hf/bert-base-uncased@latest
axon install hf/gpt2@latest

# Install PyTorch Hub models (v1.1.0+)
axon install pytorch/vision/resnet50@latest

# Search for models
axon search resnet
axon search "image classification"

# Get model information
axon info hf/bert-base-uncased@latest

# List installed models
axon list

# Update models
axon update vision/resnet50

# Remove models
axon uninstall vision/resnet50

🐳 Axon Converter Image

The Axon Converter Image is a Docker container that provides universal ONNX model conversion capabilities. It eliminates the need for Python dependencies on the host machine while enabling seamless conversion of models from any repository to ONNX format.

Key Features

  • Zero Python Installation: No Python needed on your machine - everything runs in Docker
  • Multi-Framework Support: Pre-installed PyTorch, TensorFlow, Transformers, ModelScope, and ONNX Runtime
  • Universal Conversion: Convert models from Hugging Face, PyTorch Hub, TensorFlow Hub, and ModelScope
  • Automatic Usage: Axon automatically uses the converter image when Docker is available
  • Cross-Platform: Works on macOS, Linux, and Windows (with Docker)

How It Works

When you install a model with Axon, the converter image automatically:

  1. Downloads the model from the repository
  2. Detects the framework (PyTorch, TensorFlow, Hugging Face, etc.)
  3. Converts the model to ONNX format using the appropriate conversion script
  4. Saves the ONNX file to your Axon cache

Example: Automatic Conversion

# Install a model - conversion happens automatically
axon install hf/distilgpt2@latest

# Output:
# 📦 Downloading model from Hugging Face...
# 🐳 Converting model using Docker (ghcr.io/mlos-foundation/axon-converter:latest)...
# ✅ Model converted to ONNX: model.onnx
# ✅ Model installed successfully

Integration with MLOS Core

The converter image is the bridge between Axon and MLOS Core:

1
Axon Install axon install hf/model Downloads & converts to ONNX
2
MLOS Core Register axon register hf/model Detects ONNX & auto-selects plugin
3
MLOS Core Execute POST /models/hf/model/inference ONNX Runtime executes inference

Architecture Benefits

  1. Separation of Concerns: Axon handles conversion, MLOS Core handles execution
  2. No Python in Production: MLOS Core never needs Python dependencies
  3. Universal Execution: ONNX Runtime enables all converted models to run
  4. Isolated Environment: Conversion happens in containers, avoiding conflicts

Image Details

  • Registry: ghcr.io/mlos-foundation/axon-converter
  • Tags: latest, 2.1.0, 2.1, 2
  • Size: ~4GB (compressed: ~1.2GB)
  • Platforms: linux/amd64, linux/arm64
  • Also Available: OCI artifacts attached to GitHub Releases

Learn More: See the complete converter image documentation for detailed usage, troubleshooting, and technical details.

Status

MVP Complete - Axon has achieved MVP status with core functionality implemented:

  • ✅ Manifest system with YAML parser and validator
  • ✅ Manifest-first architecture with format-agnostic execution
  • ✅ Automatic execution format detection (ONNX, PyTorch, TensorFlow)
  • ✅ Cache manager for local model storage
  • ✅ Registry client for model discovery
  • ✅ CLI commands for model lifecycle management
  • ✅ Comprehensive testing and CI/CD pipeline

View Axon on GitHub →

MLOS Core

MLOS Core is the kernel-level machine learning runtime that provides the foundation for high-performance ML inference. It implements a plugin-based architecture supporting multiple ML frameworks through the Standard Model Interface (SMI).

🔗 Axon Integration

MLOS Core integrates seamlessly with Axon for complete end-to-end model delivery and execution:

  • Model Package Format (MPF): MLOS Core relies on Axon packages as specified in patent US-63/861,527
  • Standardized Metadata: Reads Axon manifests for model information (framework, resources, I/O schema, execution format)
  • Format-Agnostic Execution: Dynamic plugin selection based on manifest's execution format
  • Universal Delivery: Works with models from any repository (Hugging Face, PyTorch Hub, TensorFlow Hub, ModelScope) via Axon
  • Plugin Independence: Plugins receive path to model files, don't need to know about Axon

Complete E2E Workflow:

End-to-End Integration Example

# 1. Install model with Axon
axon install hf/bert-base-uncased@latest

# 2. Register with MLOS Core
axon register hf/bert-base-uncased@latest

# 3. Run inference via MLOS Core API
curl -X POST http://localhost:8080/models/hf/bert-base-uncased@latest/inference \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello, MLOS!", "batch_size": 1}'

Architecture Details:

  • Delivery Layer (Axon): Handles all repository interactions, creates standardized `.axon` packages with `manifest.yaml`, provides metadata. Does NOT execute models.
  • Execution Layer (MLOS Core): Reads Axon manifests for metadata, manages model lifecycle, routes to appropriate plugins. Does NOT access repositories directly.
  • Framework Layer (Plugins): Load framework-specific model formats, execute inference. Plugins receive path to model files, don't need to know about Axon.

Patent Alignment: This integration demonstrates key innovations from MLOS Foundation patents - US-63/861,527 (MLOS System with native lifecycle management) and US-63/865,176 (Kernel-level optimizations for ML workloads). Axon packages ARE the Model Package Format (MPF) implementation.

Architecture

MLOS Core consists of several key components:

  • Core Engine: Plugin registry, model registry, and resource manager
  • API Layer: Multi-protocol support (HTTP, gRPC, IPC)
  • SMI Interface: Standardized interface for ML framework plugins
  • Resource Management: Intelligent allocation and sharing of compute resources
  • Axon Manifest Reader: Reads standardized Axon packages (MPF) for model metadata

Performance

Kernel-level optimizations enable ultra-low latency inference:

  • Sub-millisecond inference for small models (IPC API)
  • Zero-copy operations for large tensors
  • Efficient GPU memory and compute sharing
  • Automatic batching for throughput optimization

View Architecture Details →

MLOS Linux Distributions

Complete Linux distributions optimized for machine learning workloads, with MLOS Core, Axon, and kernel-level ML optimizations pre-installed. Currently in planning phase (target: v1.0.0 - Q2-Q3 2026).

Distribution Variants

  • MLOS Linux (Ubuntu): Based on Ubuntu 22.04/24.04 LTS with .deb packages, traditional installer, and cloud images
  • MLOS Linux (Flatcar): Based on Flatcar Linux with container-first architecture, Ignition-based provisioning, and immutable filesystem
  • Shared Kernel Patches: ML-aware scheduler, tensor memory management, and GPU orchestration (US-63/865,176)

Key Features (Planned)

  • ML-Aware Kernel: Priority-based ML task scheduling with tensor operation awareness
  • Pre-Installed Stack: MLOS Core, Axon, and ML development toolchain out of the box
  • Standards Compliant: LSB, FHS, systemd, Debian Policy (Ubuntu) / Ignition (Flatcar)
  • IP Protection: Binary artifacts from private repositories with GPG signature verification
  • Universal Model Support: Works with models from Hugging Face, PyTorch Hub, TensorFlow Hub, ModelScope via Axon

Architecture

Both distributions include:

  • Kernel Modifications: ML-aware scheduler, tensor memory management, GPU resource orchestration
  • MLOS Core: Kernel-level ML runtime with HTTP, gRPC, and IPC APIs
  • Axon: Universal model installer with 80%+ repository coverage
  • ML Toolchain: Python, PyTorch, TensorFlow, ONNX Runtime pre-installed

Status & Repositories

Status: 🔄 Planning Phase (Repository structure created, build infrastructure pending)

Note: Distribution repositories are public and contain build scripts, configuration, and documentation. MLOS Core binaries and kernel patches are downloaded from private registries during build time with GPG signature verification. See IP Protection Plan for details.

Standard Model Interface (SMI)

The Standard Model Interface specification provides a unified interface for ML frameworks, enabling framework-agnostic model deployment and management.

Benefits

  • Framework Interoperability: Deploy models from any supported framework
  • Plugin System: Hot-swappable framework plugins
  • Resource Declaration: Declarative resource requirements
  • Multi-language Support: C, Python, Go, JavaScript bindings

Supported Frameworks

SMI enables plugins for:

  • PyTorch
  • TensorFlow
  • ONNX Runtime
  • Custom frameworks (via plugin API)

Future Components

The MLOS ecosystem continues to evolve with additional components planned:

  • MLOS Kernel: Kernel-level ML runtime optimizations
  • MLOS Scheduler: ML workload scheduling and orchestration
  • Axon Registry: Model registry and discovery service
  • Axon Hub: Web UI for model discovery and management
  • Axon SDK: Python SDK for model packaging