MLOS Architecture - MLOS Foundation

System Overview

MLOS Core implements a kernel-level machine learning operating system with a plugin-based architecture. The system provides standardized interfaces for ML frameworks while maintaining high performance and resource efficiency through direct OS integration.

Core Components

1. MLOS Core Engine

The central orchestrator that manages the entire MLOS system:

Plugin Registry: Manages loaded ML framework plugins with lifecycle tracking
Model Registry: Tracks registered models, their versions, and metadata
Resource Manager: Allocates and manages compute resources (CPU, GPU, memory)
SMI Interface: Implements Standard Model Interface specification for plugin communication

2. Multi-Protocol API Layer

Three API protocols for different use cases:

HTTP REST API: Management operations and easy integration (port 8080)
gRPC API: High-performance binary protocol for production workloads (port 8081)
IPC API: Ultra-low latency Unix domain sockets for local applications

3. Plugin Architecture

Framework-agnostic plugin system:

Dynamic Loading: Load/unload plugins without system restart
Process Isolation: Each plugin runs in separate process for fault tolerance
Version Support: Multiple plugin versions can run simultaneously
Hot-Swapping: Update plugins without downtime

System Flow

The following diagram illustrates the typical flow of operations in MLOS:

1

Register Plugin

Client application registers an ML framework plugin with MLOS Core

2

Load Plugin

MLOS Core dynamically loads the plugin and initializes it

3

Register Model

Client registers a model with MLOS Core, which forwards to the plugin

4

Model Registration Complete

Plugin confirms model registration and returns model metadata

5

Inference Request

Client sends inference request to MLOS Core via API

6

Execute Inference

MLOS Core routes request to appropriate plugin for execution

7

Return Results

Plugin returns inference results to MLOS Core

8

Response to Client

MLOS Core returns results to client application

Performance Characteristics

MLOS achieves high performance through kernel-level optimizations and efficient resource management:

Operation	HTTP API	gRPC API	IPC API
Plugin Registration	~5ms	~2ms	~0.5ms
Model Registration	~10ms	~5ms	~1ms
Inference (small model)	~2ms	~1ms	~0.1ms
Inference (large model)	~50ms	~25ms	~10ms
Health Check	~1ms	~0.5ms	~0.05ms

Deployment Patterns

Single Node Deployment

MLOS Core runs on a single node with multiple plugins. Ideal for development, testing, and small-scale deployments.

Distributed Deployment

Multiple MLOS Core nodes can be deployed behind a load balancer or service mesh. Enables horizontal scaling and high availability for production workloads.

Axon Architecture - Data Flow Diagram

Axon provides the neural pathway for ML models, managing the complete lifecycle from discovery to deployment. The following diagram illustrates the data flow through the Axon system:

Data Flow Components

User / CLI: User interacts with Axon through command-line interface
Axon CLI: Main command processor that routes operations to appropriate components
Config Manager: Manages local configuration (registry URLs, cache settings, etc.)
Registry Client: HTTP client for communicating with remote model registry
Manifest Parser: Parses and validates YAML manifest files
Cache Manager: Manages local model storage, metadata, and cache operations
Model Registry: External service providing model discovery and distribution
Local Cache: On-disk storage for downloaded models and metadata

Typical Data Flow: Model Installation

User executes axon install vision/resnet50
CLI loads configuration from Config Manager
Registry Client queries Model Registry for manifest
Manifest Parser validates the returned YAML manifest
Cache Manager checks if model already exists locally
If not cached, Registry Client downloads package from registry
Cache Manager stores model package and metadata in Local Cache
Model files are extracted and stored in file system

Security & Isolation

MLOS implements comprehensive security measures:

Plugin Sandboxing: Each plugin runs in isolated process with resource limits
API Security: Token-based authentication and role-based access control
Rate Limiting: Per-client request throttling
Input Validation: Comprehensive input sanitization for all API requests