Token Tracking¶
Mamba Agents automatically tracks token usage and estimates costs across all agent runs.
Overview¶
Every time you run an agent, token usage is recorded:
- Prompt tokens - Tokens in the input (messages + system prompt)
- Completion tokens - Tokens in the model's response
- Total tokens - Combined count
- Request count - Number of API calls
Built-in Tracking¶
Token tracking is always enabled:
from mamba_agents import Agent
agent = Agent("gpt-4o")
# Run some queries
agent.run_sync("Hello!")
agent.run_sync("Tell me about Python")
agent.run_sync("What are decorators?")
# Get aggregate usage
usage = agent.get_usage()
print(f"Total tokens: {usage.total_tokens}")
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Requests: {usage.request_count}")
Cost Estimation¶
Get estimated costs based on model pricing:
# Get total cost
cost = agent.get_cost()
print(f"Estimated cost: ${cost:.4f}")
# Get detailed breakdown
breakdown = agent.get_cost_breakdown()
print(f"Prompt cost: ${breakdown.prompt_cost:.4f}")
print(f"Completion cost: ${breakdown.completion_cost:.4f}")
print(f"Total cost: ${breakdown.total_cost:.4f}")
print(f"Model: {breakdown.model}")
Default Pricing¶
Mamba Agents includes default pricing for common models:
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4-turbo | $10.00 | $30.00 |
| gpt-3.5-turbo | $0.50 | $1.50 |
| claude-3-5-sonnet | $3.00 | $15.00 |
| claude-3-opus | $15.00 | $75.00 |
| Local models | $0.00 | $0.00 |
Custom Cost Rates¶
Set custom rates via settings:
from mamba_agents import AgentSettings
settings = AgentSettings(
cost_rates={
"my-custom-model": 0.001, # Per 1000 tokens
}
)
Or via environment:
Usage History¶
Get per-request usage details:
history = agent.get_usage_history()
for record in history:
print(f"Time: {record.timestamp}")
print(f"Prompt tokens: {record.prompt_tokens}")
print(f"Completion tokens: {record.completion_tokens}")
print(f"Total: {record.total_tokens}")
print(f"Model: {record.model}")
if record.tool_name:
print(f"Tool: {record.tool_name}")
print("---")
Token Counting¶
Count tokens for arbitrary text:
# Count tokens in text
count = agent.get_token_count("Hello, how are you today?")
print(f"Tokens: {count}")
# Count current context
context_tokens = agent.get_token_count() # No argument = current context
print(f"Context tokens: {context_tokens}")
Reset Tracking¶
# Reset usage tracking (keeps context)
agent.reset_tracking()
# Reset everything (context + tracking)
agent.reset_all()
Standalone Token Utilities¶
For advanced use cases, use the token modules directly:
TokenCounter¶
from mamba_agents.tokens import TokenCounter
counter = TokenCounter(encoding="cl100k_base")
# Count tokens in text
count = counter.count("Hello, world!")
# Count tokens in messages
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"},
]
count = counter.count_messages(messages)
UsageTracker¶
from mamba_agents.tokens import UsageTracker
tracker = UsageTracker()
# Record usage
tracker.record_usage(
input_tokens=100,
output_tokens=50,
model="gpt-4o",
)
# Get summary
summary = tracker.get_summary()
print(f"Total: {summary.total_tokens}")
# Get history
history = tracker.get_history()
# Reset
tracker.reset()
CostEstimator¶
from mamba_agents.tokens import CostEstimator
estimator = CostEstimator()
# Estimate cost
cost = estimator.estimate(
input_tokens=1000,
output_tokens=500,
model="gpt-4o",
)
print(f"Cost: ${cost.total_cost:.4f}")
# Get rate for a model
rate = estimator.get_rate("gpt-4o")
# Set custom rate
estimator.set_rate("my-model", 0.002)
# Get all rates
all_rates = estimator.get_all_rates()
Integration with Workflows¶
Workflows track usage through the agent:
from mamba_agents import Agent
from mamba_agents.workflows import ReActWorkflow
agent = Agent("gpt-4o")
workflow = ReActWorkflow(agent=agent)
result = await workflow.run("Research Python best practices")
# Access usage through workflow
usage = workflow.get_token_usage()
cost = workflow.get_cost()
print(f"Workflow cost: ${cost:.4f}")
Monitoring Usage¶
Logging Usage¶
import logging
logging.basicConfig(level=logging.INFO)
agent = Agent("gpt-4o")
agent.run_sync("Hello")
# Usage is logged automatically
# INFO: Request completed: 45 tokens, $0.0001
Usage Callbacks¶
Monitor usage in real-time with hooks:
from mamba_agents import WorkflowHooks
def log_usage(state, step):
usage = state.context.get("usage", {})
print(f"Step {step.step_number} used {usage.get('tokens', 0)} tokens")
hooks = WorkflowHooks(on_step_complete=log_usage)
Best Practices¶
1. Monitor Costs in Production¶
# After each run
result = agent.run_sync(query)
cost = agent.get_cost()
if cost > budget_limit:
logger.warning(f"Cost ${cost:.4f} exceeded budget ${budget_limit}")
2. Reset Tracking Periodically¶
# Per-session tracking
def handle_session(user_id):
agent.reset_tracking()
# Process requests...
# Log session usage
usage = agent.get_usage()
log_user_usage(user_id, usage)
3. Use Cheaper Models for High Volume¶
# Route based on task complexity
if is_simple_task(query):
agent = Agent("gpt-4o-mini") # Cheaper
else:
agent = Agent("gpt-4o") # More capable
Next Steps¶
- Context Management - Manage token usage with compaction
- Cost Estimation API - Full reference
- UsageTracker API - Detailed tracking