System Overview
MLOS Core implements a kernel-level machine learning operating system with a plugin-based architecture. The system provides standardized interfaces for ML frameworks while maintaining high performance and resource efficiency through direct OS integration.
Core Components
1. MLOS Core Engine
The central orchestrator that manages the entire MLOS system:
- Plugin Registry: Manages loaded ML framework plugins with lifecycle tracking
- Model Registry: Tracks registered models, their versions, and metadata
- Resource Manager: Allocates and manages compute resources (CPU, GPU, memory)
- SMI Interface: Implements Standard Model Interface specification for plugin communication
2. Multi-Protocol API Layer
Three API protocols for different use cases:
- HTTP REST API: Management operations and easy integration (port 8080)
- gRPC API: High-performance binary protocol for production workloads (port 8081)
- IPC API: Ultra-low latency Unix domain sockets for local applications
3. Plugin Architecture
Framework-agnostic plugin system:
- Dynamic Loading: Load/unload plugins without system restart
- Process Isolation: Each plugin runs in separate process for fault tolerance
- Version Support: Multiple plugin versions can run simultaneously
- Hot-Swapping: Update plugins without downtime
System Flow
The following diagram illustrates the typical flow of operations in MLOS:
Register Plugin
Client application registers an ML framework plugin with MLOS Core
Load Plugin
MLOS Core dynamically loads the plugin and initializes it
Register Model
Client registers a model with MLOS Core, which forwards to the plugin
Model Registration Complete
Plugin confirms model registration and returns model metadata
Inference Request
Client sends inference request to MLOS Core via API
Execute Inference
MLOS Core routes request to appropriate plugin for execution
Return Results
Plugin returns inference results to MLOS Core
Response to Client
MLOS Core returns results to client application
Performance Characteristics
MLOS achieves high performance through kernel-level optimizations and efficient resource management:
| Operation | HTTP API | gRPC API | IPC API |
|---|---|---|---|
| Plugin Registration | ~5ms | ~2ms | ~0.5ms |
| Model Registration | ~10ms | ~5ms | ~1ms |
| Inference (small model) | ~2ms | ~1ms | ~0.1ms |
| Inference (large model) | ~50ms | ~25ms | ~10ms |
| Health Check | ~1ms | ~0.5ms | ~0.05ms |
Deployment Patterns
Single Node Deployment
MLOS Core runs on a single node with multiple plugins. Ideal for development, testing, and small-scale deployments.
Distributed Deployment
Multiple MLOS Core nodes can be deployed behind a load balancer or service mesh. Enables horizontal scaling and high availability for production workloads.
Axon Architecture - Data Flow Diagram
Axon provides the neural pathway for ML models, managing the complete lifecycle from discovery to deployment. The following diagram illustrates the data flow through the Axon system:
Data Flow Components
- User / CLI: User interacts with Axon through command-line interface
- Axon CLI: Main command processor that routes operations to appropriate components
- Config Manager: Manages local configuration (registry URLs, cache settings, etc.)
- Registry Client: HTTP client for communicating with remote model registry
- Manifest Parser: Parses and validates YAML manifest files
- Cache Manager: Manages local model storage, metadata, and cache operations
- Model Registry: External service providing model discovery and distribution
- Local Cache: On-disk storage for downloaded models and metadata
Typical Data Flow: Model Installation
- User executes
axon install vision/resnet50 - CLI loads configuration from Config Manager
- Registry Client queries Model Registry for manifest
- Manifest Parser validates the returned YAML manifest
- Cache Manager checks if model already exists locally
- If not cached, Registry Client downloads package from registry
- Cache Manager stores model package and metadata in Local Cache
- Model files are extracted and stored in file system
Security & Isolation
MLOS implements comprehensive security measures:
- Plugin Sandboxing: Each plugin runs in isolated process with resource limits
- API Security: Token-based authentication and role-based access control
- Rate Limiting: Per-client request throttling
- Input Validation: Comprehensive input sanitization for all API requests