Context Management¶
Mamba Agents automatically tracks conversation context and provides strategies to manage long conversations.
Overview¶
As conversations grow, the context window can exceed model limits. The context management system:
- Tracks messages across agent runs
- Counts tokens to monitor usage
- Compacts context when thresholds are reached
- Preserves important messages during compaction
Built-in Agent Context¶
Context tracking is enabled by default:
from mamba_agents import Agent
agent = Agent("gpt-4o")
# Run multiple turns - context is maintained automatically
agent.run_sync("My name is Alice")
agent.run_sync("I'm working on a Python project")
result = agent.run_sync("What's my name and what am I working on?")
# Agent remembers both pieces of information
Checking Context State¶
# Get all tracked messages
messages = agent.get_messages()
print(f"Total messages: {len(messages)}")
# Get context state
state = agent.get_context_state()
print(f"Token count: {state.token_count}")
print(f"Message count: {state.message_count}")
print(f"Compaction history: {len(state.compaction_history)}")
Manual Compaction¶
# Check if compaction is needed
if agent.should_compact():
result = await agent.compact()
print(f"Removed {result.removed_count} messages")
print(f"Tokens before: {result.tokens_before}")
print(f"Tokens after: {result.tokens_after}")
Auto-compaction¶
By default, context is automatically compacted when thresholds are reached:
from mamba_agents import Agent, AgentConfig, CompactionConfig
config = AgentConfig(
auto_compact=True, # Enabled by default
context=CompactionConfig(
trigger_threshold_tokens=100000, # Compact at 100k tokens
target_tokens=80000, # Target 80k after compaction
),
)
agent = Agent("gpt-4o", config=config)
# Compaction happens automatically when threshold is reached
for i in range(100):
agent.run_sync(f"Message {i}: Tell me about topic {i}")
Disabling Context Tracking¶
config = AgentConfig(track_context=False)
agent = Agent("gpt-4o", config=config)
# No context is maintained between runs
Compaction Strategies¶
Five strategies are available for compacting context:
1. Sliding Window¶
Removes oldest messages beyond a threshold:
config = CompactionConfig(
strategy="sliding_window",
preserve_recent_turns=10, # Keep last 10 exchanges
)
Best for: Simple conversations where recent context is most important.
2. Summarize Older¶
Uses an LLM to summarize older messages:
config = CompactionConfig(
strategy="summarize_older",
preserve_recent_turns=5,
summarization_model="gpt-4o-mini", # Model for summarization
)
Best for: Long conversations where historical context matters.
3. Selective Pruning¶
Removes completed tool call/result pairs:
Best for: Tool-heavy workflows where tool results are no longer needed.
4. Importance Scoring¶
Uses LLM to score and prune least important messages:
Best for: Complex conversations with varying importance levels.
5. Hybrid¶
Combines multiple strategies in sequence:
Best for: General-purpose use with good balance.
Configuration Reference¶
CompactionConfig Options¶
| Option | Type | Default | Description |
|---|---|---|---|
strategy |
str | "sliding_window" |
Compaction strategy |
trigger_threshold_tokens |
int | 100000 |
Token count triggering compaction |
target_tokens |
int | 80000 |
Target tokens after compaction |
preserve_recent_turns |
int | 10 |
Recent turns to always preserve |
preserve_system_prompt |
bool | True |
Always keep system prompt |
summarization_model |
str | "same" |
Model for summarization ("same" uses current) |
Standalone Context Manager¶
For advanced use cases, use ContextManager directly:
from mamba_agents.context import ContextManager, CompactionConfig
# Create manager with config
config = CompactionConfig(
strategy="hybrid",
trigger_threshold_tokens=50000,
target_tokens=40000,
)
manager = ContextManager(config=config)
# Add messages
manager.add_messages([
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"},
])
# Set system prompt
manager.set_system_prompt("You are helpful.")
# Get messages
messages = manager.get_messages()
# Check and compact
if manager.should_compact():
result = await manager.compact()
# Get state
state = manager.get_context_state()
# Clear all
manager.clear()
Working with Message History¶
Message Format¶
Messages follow the pydantic-ai format:
messages = agent.get_messages()
for msg in messages:
print(f"Role: {msg['role']}")
print(f"Content: {msg['content']}")
if 'tool_calls' in msg:
print(f"Tool calls: {msg['tool_calls']}")
Preserving Specific Messages¶
The system prompt is always preserved:
Best Practices¶
1. Choose the Right Strategy¶
| Use Case | Recommended Strategy |
|---|---|
| Chat applications | sliding_window |
| Research/analysis | summarize_older |
| Tool-heavy workflows | selective_pruning or hybrid |
| Long-running agents | hybrid |
2. Set Appropriate Thresholds¶
Consider your model's context window:
| Model | Context Window | Suggested Threshold |
|---|---|---|
| GPT-4o | 128k | 100k |
| GPT-4o-mini | 128k | 100k |
| Claude 3.5 | 200k | 150k |
| Llama 3.2 | 8k-128k | 75% of limit |
3. Monitor Compaction¶
state = agent.get_context_state()
for compaction in state.compaction_history:
print(f"Strategy: {compaction.strategy}")
print(f"Removed: {compaction.removed_count}")
Next Steps¶
- Token Tracking - Monitor token usage
- Compaction Strategies Explained - Deep dive
- ContextManager API - Full reference