πŸ§ͺ E2E Testing Guide

Learn how the MLOS End-to-End testing system works and how to extend it for new models.

πŸ“Š Live Test Reports: View the latest E2E validation report

Overview

The MLOS E2E testing system validates the complete stack from model installation through inference. Tests run automatically on GitHub Actions and publish results to a live dashboard.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        E2E Test Pipeline                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  1. Download Releases          2. Install Models                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚  Axon v3.1.3     β”‚         β”‚  Hugging Face    β”‚                 β”‚
β”‚  β”‚  Core v3.2.8     β”‚         β”‚  Models β†’ ONNX   β”‚                 β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚           β”‚                            β”‚                            β”‚
β”‚           β–Ό                            β–Ό                            β”‚
β”‚  3. Start MLOS Core           4. Register Models                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚  HTTP API :18080 │◀────────│  axon register   β”‚                 β”‚
β”‚  β”‚  ONNX Runtime    β”‚         β”‚  model.onnx      β”‚                 β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚           β”‚                                                         β”‚
β”‚           β–Ό                                                         β”‚
β”‚  5. Run Inference Tests       6. Generate Report                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚  POST /inference │────────▢│  HTML Report     β”‚                 β”‚
β”‚  β”‚  Validate Output β”‚         β”‚  GitHub Pages    β”‚                 β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration Files

1. Model Configuration (config/models.yaml)

Defines which models to test:

models:
  gpt2:
    enabled: true
    category: nlp
    axon_id: "hf/distilgpt2@latest"
    description: "DistilGPT2 - Text generation"
    
  resnet:
    enabled: true
    category: vision
    axon_id: "hf/microsoft/resnet-50@latest"
    description: "ResNet-50 - Image classification"

2. Test Input Configuration (config/test-inputs.yaml)

Defines test inputs for each model:

models:
  gpt2:
    category: nlp
    tokenizer: "distilgpt2"
    test_text:
      small: "Hello, I am a language model."
      large: "Machine learning has transformed..."
    max_length:
      small: 16
      large: 128
    required_inputs: ["input_ids"]  # ONNX model inputs
    
  resnet:
    category: vision
    input_name: "pixel_values"
    image_size: 224
    normalization:
      mean: [0.485, 0.456, 0.406]
      std: [0.229, 0.224, 0.225]
⚠️ Important: The required_inputs must match exactly what the ONNX model expects. Different models have different input requirements!

Currently Tested Models

Category Model Status Required Inputs
NLP GPT-2 (DistilGPT2) βœ… Passing input_ids
BERT βœ… Passing input_ids, attention_mask, token_type_ids
RoBERTa βœ… Passing input_ids
Vision ResNet-50 βœ… Passing pixel_values
ViT βœ… Passing pixel_values
ConvNeXt βœ… Passing pixel_values
MobileNetV2 βœ… Passing pixel_values
DeiT βœ… Passing pixel_values

Adding a New Model

Step 1: Check ONNX Model Inputs

First, determine what inputs your ONNX model expects:

# Using Docker to inspect model
docker run --rm --entrypoint="" \
  -v ~/.axon/cache/models/hf/your-model:/model \
  ghcr.io/mlos-foundation/axon-converter:latest \
  python3 -c "
import onnx
model = onnx.load('/model/model.onnx')
print('Model inputs:')
for inp in model.graph.input:
    print(f'  - {inp.name}')
"

Step 2: Add to config/models.yaml

models:
  my_new_model:
    enabled: true
    category: nlp  # or vision, multimodal
    axon_id: "hf/organization/model-name@latest"
    description: "My Model - Task description"

Step 3: Add to config/test-inputs.yaml

models:
  my_new_model:
    category: nlp
    tokenizer: "organization/model-name"
    test_text:
      small: "Test sentence."
      large: "Longer test text..."
    max_length:
      small: 16
      large: 128
    required_inputs: ["input_ids"]  # Match ONNX inputs!

Step 4: Test Locally

# Generate test input
python3 scripts/generate-test-input.py my_new_model small

# Install and test
axon install hf/organization/model-name@latest
axon register hf/organization/model-name@latest

# Run inference
curl -X POST http://localhost:18080/models/my_new_model/inference \
  -H "Content-Type: application/json" \
  -d "$(python3 scripts/generate-test-input.py my_new_model small)"

Core API Input Format

MLOS Core expects a flat JSON format where keys match ONNX input names:

# NLP model (single input)
{"input_ids": [101, 7592, 102]}

# NLP model (multiple inputs - e.g., BERT)
{
  "input_ids": [101, 7592, 102],
  "attention_mask": [1, 1, 1],
  "token_type_ids": [0, 0, 0]
}

# Vision model (flat array of pixel values)
{"pixel_values": [0.1, 0.2, 0.3, ...]}
πŸ’‘ Tip: Use python3 scripts/generate-test-input.py <model> --pretty to see formatted JSON output for debugging.

Running Tests

Locally

# Full E2E test
./scripts/test-release-e2e.sh.bash

# With specific versions
AXON_VERSION=v3.1.3 CORE_VERSION=3.2.8-alpha ./scripts/test-release-e2e.sh.bash

GitHub Actions

# Trigger manually
gh workflow run e2e-test.yml \
  -f axon_version=v3.1.3 \
  -f core_version=3.2.8-alpha

# View results
gh run list --workflow=e2e-test.yml

Troubleshooting

"Inference execution failed"

"Model not found"

Resources