Skip to content

Model Backends

Mamba Agents supports local models through OpenAI-compatible APIs like Ollama, vLLM, and LM Studio.

Overview

The backends module provides:

  • OpenAI-compatible adapter - Connect to any OpenAI-compatible API
  • Factory functions - Quick setup for popular providers
  • Model profiles - Metadata about model capabilities

Quick Start

Ollama

from mamba_agents import Agent, AgentSettings

# Option 1: Via settings
settings = AgentSettings(
    model_backend={
        "base_url": "http://localhost:11434/v1",
        "model": "llama3.2",
    }
)
agent = Agent(settings=settings)

# Option 2: Via environment
# MAMBA_MODEL_BACKEND__BASE_URL=http://localhost:11434/v1
# MAMBA_MODEL_BACKEND__MODEL=llama3.2

Using Backend Factories

from mamba_agents.backends import (
    create_ollama_backend,
    create_vllm_backend,
    create_lmstudio_backend,
)

# Ollama
backend = create_ollama_backend("llama3.2")

# vLLM
backend = create_vllm_backend("meta-llama/Llama-3.2-3B-Instruct")

# LM Studio
backend = create_lmstudio_backend()  # Uses default model

Ollama Setup

Installation

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Start the server
ollama serve

# Pull a model
ollama pull llama3.2

Usage

from mamba_agents import Agent, AgentSettings

settings = AgentSettings(
    model_backend={
        "base_url": "http://localhost:11434/v1",
        "model": "llama3.2",
        "temperature": 0.7,
    }
)

agent = Agent(settings=settings)
result = agent.run_sync("Hello!")

Environment Configuration

MAMBA_MODEL_BACKEND__BASE_URL=http://localhost:11434/v1
MAMBA_MODEL_BACKEND__MODEL=llama3.2

vLLM Setup

Installation

pip install vllm

# Start the server
vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8000

Usage

from mamba_agents import Agent, AgentSettings

settings = AgentSettings(
    model_backend={
        "base_url": "http://localhost:8000/v1",
        "model": "meta-llama/Llama-3.2-3B-Instruct",
    }
)

agent = Agent(settings=settings)

LM Studio Setup

Installation

  1. Download LM Studio from lmstudio.ai
  2. Download a model through the app
  3. Start the local server (Settings > Local Server)

Usage

from mamba_agents import Agent, AgentSettings

settings = AgentSettings(
    model_backend={
        "base_url": "http://localhost:1234/v1",
        "model": "local-model",  # LM Studio uses the loaded model
    }
)

agent = Agent(settings=settings)

OpenAI-Compatible Backend

For custom or other OpenAI-compatible servers:

from mamba_agents.backends import OpenAICompatibleBackend

backend = OpenAICompatibleBackend(
    model="my-model",
    base_url="http://localhost:8000/v1",
    api_key="optional-key",
    temperature=0.7,
    max_tokens=4096,
)

Model Profiles

Get information about model capabilities:

from mamba_agents.backends import get_profile, ModelProfile

# Get profile for a model
profile = get_profile("gpt-4o")

print(f"Context window: {profile.context_window}")
print(f"Max output: {profile.max_output_tokens}")
print(f"Supports tools: {profile.supports_tools}")
print(f"Supports vision: {profile.supports_vision}")
print(f"Provider: {profile.provider}")

Available Profiles

Model Context Tools Vision
gpt-4o 128k Yes Yes
gpt-4o-mini 128k Yes Yes
gpt-4-turbo 128k Yes Yes
gpt-3.5-turbo 16k Yes No
claude-3-5-sonnet 200k Yes Yes
claude-3-opus 200k Yes Yes
llama3.2 8k No No
llama3.2:70b 128k Yes No

Configuration Reference

ModelBackendSettings

Option Type Default Description
model str "llama3.2" Model identifier
base_url str "http://localhost:11434/v1" API endpoint
api_key SecretStr None API key
temperature float 0.7 Sampling temperature (0.0-2.0)
max_tokens int None Max tokens to generate
timeout float 30.0 Request timeout
max_retries int 3 Retry attempts

Switching Between Providers

Using Settings Override

from mamba_agents import Agent, AgentSettings

# Load base settings
settings = AgentSettings()

# Override for local development
local_settings = AgentSettings(
    model_backend={
        "base_url": "http://localhost:11434/v1",
        "model": "llama3.2",
    }
)

# Use appropriate settings
agent = Agent(settings=local_settings if is_dev else settings)

Using Environment Variables

# Development (.env.local)
MAMBA_MODEL_BACKEND__BASE_URL=http://localhost:11434/v1
MAMBA_MODEL_BACKEND__MODEL=llama3.2

# Production (.env.prod)
MAMBA_MODEL_BACKEND__BASE_URL=https://api.openai.com/v1
MAMBA_MODEL_BACKEND__MODEL=gpt-4o
MAMBA_MODEL_BACKEND__API_KEY=sk-...

Troubleshooting

Connection Refused

# Check if server is running
import httpx

try:
    response = httpx.get("http://localhost:11434/v1/models")
    print("Server is running")
except httpx.ConnectError:
    print("Server not running - start with 'ollama serve'")

Model Not Found

# List available models (Ollama)
ollama list

# Pull missing model
ollama pull llama3.2

Timeout Errors

settings = AgentSettings(
    model_backend={
        "timeout": 60.0,  # Increase timeout
        "max_retries": 5,
    }
)

Best Practices

1. Use Environment-Based Configuration

import os

if os.getenv("ENV") == "development":
    settings = AgentSettings(model_backend={"model": "llama3.2"})
else:
    settings = AgentSettings()  # Uses production defaults

2. Handle Model Limitations

from mamba_agents.backends import get_profile

profile = get_profile(settings.model_backend.model)

if not profile.supports_tools:
    # Use a simpler approach without tools
    agent = Agent(model, tools=[])

3. Monitor Local Server Health

async def check_server_health(base_url: str) -> bool:
    async with httpx.AsyncClient() as client:
        try:
            await client.get(f"{base_url}/models", timeout=5.0)
            return True
        except Exception:
            return False

Next Steps