System Overview
mlOS Core implements a kernel-level machine learning operating system with a plugin-based architecture. The system provides standardized interfaces for ML frameworks while maintaining high performance and resource efficiency through direct OS integration.
Core Components
mlOS Core Engine
The central orchestrator that manages the entire mlOS system.
- Plugin Registry — Manages loaded ML framework plugins with lifecycle tracking
- Model Registry — Tracks registered models, versions, and metadata
- Resource Manager — Allocates and manages compute resources (CPU, GPU, memory)
- SMI Interface — Standard Model Interface for plugin communication
Multi-Protocol API Layer
Three API protocols for different use cases.
- HTTP REST — Management operations, easy integration (port 8080)
- gRPC — High-performance binary protocol for production (port 8081)
- IPC — Ultra-low latency Unix sockets for local apps
Plugin Architecture
Framework-agnostic plugin system for maximum flexibility.
- Dynamic Loading — Load/unload plugins without restart
- Process Isolation — Each plugin runs in separate process
- Version Support — Multiple versions can run simultaneously
- Hot-Swapping — Update plugins without downtime
System Flow
The typical flow of operations in mlOS from registration to inference.
Register Plugin
Client application registers an ML framework plugin with mlOS Core
Load Plugin
mlOS Core dynamically loads and initializes the plugin
Register Model
Client registers a model with mlOS Core, which forwards to the plugin
Inference Request
Client sends inference request to mlOS Core via any API protocol
Execute & Return
mlOS routes to appropriate plugin, executes inference, returns results
Performance Characteristics
mlOS achieves high performance through kernel-level optimizations and efficient resource management.
| Operation | HTTP API | gRPC API | IPC API |
|---|---|---|---|
| Plugin Registration | ~5ms | ~2ms | ~0.5ms |
| Model Registration | ~10ms | ~5ms | ~1ms |
| Inference (small model) | ~2ms | ~1ms | ~0.1ms |
| Inference (large model) | ~50ms | ~25ms | ~10ms |
| Health Check | ~1ms | ~0.5ms | ~0.05ms |
Deployment Patterns
Single Node Deployment
mlOS Core runs on a single node with multiple plugins. Ideal for development, testing, and small-scale deployments.
Distributed Deployment
Multiple mlOS Core nodes behind a load balancer or service mesh. Enables horizontal scaling and high availability for production.
Security & Isolation
mlOS implements comprehensive security measures for production deployments.
Plugin Sandboxing
Each plugin runs in isolated process with resource limits, preventing cascading failures.
API Security
Token-based authentication and role-based access control for all API endpoints.
Rate Limiting
Per-client request throttling to prevent abuse and ensure fair resource usage.