Model Backends¶
Mamba Agents supports local models through OpenAI-compatible APIs like Ollama, vLLM, and LM Studio.
Overview¶
The backends module provides:
- OpenAI-compatible adapter - Connect to any OpenAI-compatible API
- Factory functions - Quick setup for popular providers
- Model profiles - Metadata about model capabilities
Quick Start¶
Ollama¶
from mamba_agents import Agent, AgentSettings
# Option 1: Via settings
settings = AgentSettings(
model_backend={
"base_url": "http://localhost:11434/v1",
"model": "llama3.2",
}
)
agent = Agent(settings=settings)
# Option 2: Via environment
# MAMBA_MODEL_BACKEND__BASE_URL=http://localhost:11434/v1
# MAMBA_MODEL_BACKEND__MODEL=llama3.2
Using Backend Factories¶
from mamba_agents.backends import (
create_ollama_backend,
create_vllm_backend,
create_lmstudio_backend,
)
# Ollama
backend = create_ollama_backend("llama3.2")
# vLLM
backend = create_vllm_backend("meta-llama/Llama-3.2-3B-Instruct")
# LM Studio
backend = create_lmstudio_backend() # Uses default model
Ollama Setup¶
Installation¶
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Start the server
ollama serve
# Pull a model
ollama pull llama3.2
Usage¶
from mamba_agents import Agent, AgentSettings
settings = AgentSettings(
model_backend={
"base_url": "http://localhost:11434/v1",
"model": "llama3.2",
"temperature": 0.7,
}
)
agent = Agent(settings=settings)
result = agent.run_sync("Hello!")
Environment Configuration¶
vLLM Setup¶
Installation¶
Usage¶
from mamba_agents import Agent, AgentSettings
settings = AgentSettings(
model_backend={
"base_url": "http://localhost:8000/v1",
"model": "meta-llama/Llama-3.2-3B-Instruct",
}
)
agent = Agent(settings=settings)
LM Studio Setup¶
Installation¶
- Download LM Studio from lmstudio.ai
- Download a model through the app
- Start the local server (Settings > Local Server)
Usage¶
from mamba_agents import Agent, AgentSettings
settings = AgentSettings(
model_backend={
"base_url": "http://localhost:1234/v1",
"model": "local-model", # LM Studio uses the loaded model
}
)
agent = Agent(settings=settings)
OpenAI-Compatible Backend¶
For custom or other OpenAI-compatible servers:
from mamba_agents.backends import OpenAICompatibleBackend
backend = OpenAICompatibleBackend(
model="my-model",
base_url="http://localhost:8000/v1",
api_key="optional-key",
temperature=0.7,
max_tokens=4096,
)
Model Profiles¶
Get information about model capabilities:
from mamba_agents.backends import get_profile, ModelProfile
# Get profile for a model
profile = get_profile("gpt-4o")
print(f"Context window: {profile.context_window}")
print(f"Max output: {profile.max_output_tokens}")
print(f"Supports tools: {profile.supports_tools}")
print(f"Supports vision: {profile.supports_vision}")
print(f"Provider: {profile.provider}")
Available Profiles¶
| Model | Context | Tools | Vision |
|---|---|---|---|
| gpt-4o | 128k | Yes | Yes |
| gpt-4o-mini | 128k | Yes | Yes |
| gpt-4-turbo | 128k | Yes | Yes |
| gpt-3.5-turbo | 16k | Yes | No |
| claude-3-5-sonnet | 200k | Yes | Yes |
| claude-3-opus | 200k | Yes | Yes |
| llama3.2 | 8k | No | No |
| llama3.2:70b | 128k | Yes | No |
Configuration Reference¶
ModelBackendSettings¶
| Option | Type | Default | Description |
|---|---|---|---|
model |
str | "llama3.2" |
Model identifier |
base_url |
str | "http://localhost:11434/v1" |
API endpoint |
api_key |
SecretStr | None | API key |
temperature |
float | 0.7 | Sampling temperature (0.0-2.0) |
max_tokens |
int | None | Max tokens to generate |
timeout |
float | 30.0 | Request timeout |
max_retries |
int | 3 | Retry attempts |
Switching Between Providers¶
Using Settings Override¶
from mamba_agents import Agent, AgentSettings
# Load base settings
settings = AgentSettings()
# Override for local development
local_settings = AgentSettings(
model_backend={
"base_url": "http://localhost:11434/v1",
"model": "llama3.2",
}
)
# Use appropriate settings
agent = Agent(settings=local_settings if is_dev else settings)
Using Environment Variables¶
# Development (.env.local)
MAMBA_MODEL_BACKEND__BASE_URL=http://localhost:11434/v1
MAMBA_MODEL_BACKEND__MODEL=llama3.2
# Production (.env.prod)
MAMBA_MODEL_BACKEND__BASE_URL=https://api.openai.com/v1
MAMBA_MODEL_BACKEND__MODEL=gpt-4o
MAMBA_MODEL_BACKEND__API_KEY=sk-...
Troubleshooting¶
Connection Refused¶
# Check if server is running
import httpx
try:
response = httpx.get("http://localhost:11434/v1/models")
print("Server is running")
except httpx.ConnectError:
print("Server not running - start with 'ollama serve'")
Model Not Found¶
Timeout Errors¶
Best Practices¶
1. Use Environment-Based Configuration¶
import os
if os.getenv("ENV") == "development":
settings = AgentSettings(model_backend={"model": "llama3.2"})
else:
settings = AgentSettings() # Uses production defaults
2. Handle Model Limitations¶
from mamba_agents.backends import get_profile
profile = get_profile(settings.model_backend.model)
if not profile.supports_tools:
# Use a simpler approach without tools
agent = Agent(model, tools=[])
3. Monitor Local Server Health¶
async def check_server_health(base_url: str) -> bool:
async with httpx.AsyncClient() as client:
try:
await client.get(f"{base_url}/models", timeout=5.0)
return True
except Exception:
return False
Next Steps¶
- Error Handling - Handle connection failures
- Local LLM Tutorial - Step-by-step guide
- ModelBackendSettings API - Full reference