MLOS Architecture

Kernel-level machine learning runtime with plugin-based design

System Overview

MLOS Core implements a kernel-level machine learning operating system with a plugin-based architecture. The system provides standardized interfaces for ML frameworks while maintaining high performance and resource efficiency through direct OS integration.

Client Applications API Layer HTTP REST gRPC IPC (Unix Socket) MLOS Core Engine Plugin Registry Model Registry Resource Manager SMI Interface PyTorch Plugin TensorFlow Plugin ONNX Plugin Custom Plugin Operating System Kernel

Core Components

1. MLOS Core Engine

The central orchestrator that manages the entire MLOS system:

  • Plugin Registry: Manages loaded ML framework plugins with lifecycle tracking
  • Model Registry: Tracks registered models, their versions, and metadata
  • Resource Manager: Allocates and manages compute resources (CPU, GPU, memory)
  • SMI Interface: Implements Standard Model Interface specification for plugin communication

2. Multi-Protocol API Layer

Three API protocols for different use cases:

  • HTTP REST API: Management operations and easy integration (port 8080)
  • gRPC API: High-performance binary protocol for production workloads (port 8081)
  • IPC API: Ultra-low latency Unix domain sockets for local applications

3. Plugin Architecture

Framework-agnostic plugin system:

  • Dynamic Loading: Load/unload plugins without system restart
  • Process Isolation: Each plugin runs in separate process for fault tolerance
  • Version Support: Multiple plugin versions can run simultaneously
  • Hot-Swapping: Update plugins without downtime

System Flow

The following diagram illustrates the typical flow of operations in MLOS:

1

Register Plugin

Client application registers an ML framework plugin with MLOS Core

2

Load Plugin

MLOS Core dynamically loads the plugin and initializes it

3

Register Model

Client registers a model with MLOS Core, which forwards to the plugin

4

Model Registration Complete

Plugin confirms model registration and returns model metadata

5

Inference Request

Client sends inference request to MLOS Core via API

6

Execute Inference

MLOS Core routes request to appropriate plugin for execution

7

Return Results

Plugin returns inference results to MLOS Core

8

Response to Client

MLOS Core returns results to client application

Performance Characteristics

MLOS achieves high performance through kernel-level optimizations and efficient resource management:

Operation HTTP API gRPC API IPC API
Plugin Registration ~5ms ~2ms ~0.5ms
Model Registration ~10ms ~5ms ~1ms
Inference (small model) ~2ms ~1ms ~0.1ms
Inference (large model) ~50ms ~25ms ~10ms
Health Check ~1ms ~0.5ms ~0.05ms

Deployment Patterns

Single Node Deployment

MLOS Core runs on a single node with multiple plugins. Ideal for development, testing, and small-scale deployments.

Distributed Deployment

Multiple MLOS Core nodes can be deployed behind a load balancer or service mesh. Enables horizontal scaling and high availability for production workloads.

Axon Architecture - Data Flow Diagram

Axon provides the neural pathway for ML models, managing the complete lifecycle from discovery to deployment. The following diagram illustrates the data flow through the Axon system:

User / CLI Axon CLI Config Manager Registry Client Manifest Parser Cache Manager Model Registry (External) Local Cache (~/.axon/cache) Local File System Config Store (~/.axon/config.yaml) Manifest Store (manifest.yaml files) Model Packages (.axon packages) Commands Config Search/Download Manifest Cache HTTP Request Parse YAML Store Model

Data Flow Components

  • User / CLI: User interacts with Axon through command-line interface
  • Axon CLI: Main command processor that routes operations to appropriate components
  • Config Manager: Manages local configuration (registry URLs, cache settings, etc.)
  • Registry Client: HTTP client for communicating with remote model registry
  • Manifest Parser: Parses and validates YAML manifest files
  • Cache Manager: Manages local model storage, metadata, and cache operations
  • Model Registry: External service providing model discovery and distribution
  • Local Cache: On-disk storage for downloaded models and metadata

Typical Data Flow: Model Installation

  1. User executes axon install vision/resnet50
  2. CLI loads configuration from Config Manager
  3. Registry Client queries Model Registry for manifest
  4. Manifest Parser validates the returned YAML manifest
  5. Cache Manager checks if model already exists locally
  6. If not cached, Registry Client downloads package from registry
  7. Cache Manager stores model package and metadata in Local Cache
  8. Model files are extracted and stored in file system

Security & Isolation

MLOS implements comprehensive security measures:

  • Plugin Sandboxing: Each plugin runs in isolated process with resource limits
  • API Security: Token-based authentication and role-based access control
  • Rate Limiting: Per-client request throttling
  • Input Validation: Comprehensive input sanitization for all API requests