Why Sugar V3?
An honest look at what V3 improves, backed by real benchmarks. No marketing hype.
The Short Version
Sugar V3 replaces subprocess calls to Claude CLI with native Agent SDK integration. This provides:
- Security gates - Block dangerous operations before they execute
- Observability - Structured tool tracking instead of regex parsing
- Reliability - Automatic retry on transient API errors
What V3 does NOT improve: Claude's processing speed, token costs, or task quality. Those depend on Claude, not the executor.
Architecture Comparison
V2: Subprocess Model
Python Process
|
+-- spawn subprocess --> Claude CLI Process
|
+-- stdin: prompt text
+-- stdout/stderr: raw output
|
<-- regex parsing --------- output text V3: Native SDK Model
Python Process
|
+-- import claude_agent_sdk
|
+-- async generator --> streaming messages
| |
+-- typed objects <------- tool uses / text / results
|
+-- hooks --------------> quality gates
|
+-- direct access ------> files / actions / metrics Real Benchmarks
These numbers come from running scripts/benchmark_v3.py in the Sugar repository.
Startup Overhead
| Metric | V2 (Subprocess) | V3 (SDK) | Improvement |
|---|---|---|---|
| Initialization time | 300-500ms | ~27ms | 10-15x faster |
| Memory overhead | 50-100MB (separate process) | <1MB (shared process) | 50-100x less |
Per-Operation Latency
| Operation | V3 Latency |
|---|---|
| Security check (per tool use) | 0.002ms |
| Response serialization | <0.001ms |
| Transient error detection | <0.001ms |
Context: What This Actually Means
A typical task breakdown:
V2: 400ms overhead + 60s Claude API = 60.4s total
V3: 27ms overhead + 60s Claude API = 60.03s total
Improvement: ~370ms (less than 1% of total time) The overhead improvement is real but small compared to Claude's processing time. The real value is in the new capabilities.
What V3 Actually Provides
1. Security Gates (New Capability)
V3 can block dangerous operations before execution. V2 had no mechanism for this.
# V3 blocks access to protected files
quality_gates:
protected_files:
- ".env"
- "*.pem"
- "credentials.json"
- "secrets/*"
# V3 blocks dangerous commands
blocked_commands:
- "rm -rf /"
- "sudo"
- "chmod -R 777 /"
- "> /dev/sda" When a blocked operation is attempted:
# PreToolUse hook intercepts and denies
{
"permissionDecision": "deny",
"permissionDecisionReason": "Protected file access blocked: .env"
} 2. Observability (Improved Accuracy)
| Capability | V2 | V3 |
|---|---|---|
| Tool use tracking | Regex parsing of stdout | Direct from message stream |
| File modifications | Pattern matching (unreliable) | Hook-based (accurate) |
| Command execution | Inferred from output | Captured in real-time |
| Execution history | Limited session state | Full audit trail |
V3 response includes structured data:
{
"success": true,
"tool_uses": [
{"tool": "Write", "input": {"file_path": "/src/main.py"}},
{"tool": "Bash", "input": {"command": "pytest"}}
],
"files_modified": ["/src/main.py", "/tests/test_main.py"],
"execution_time": 45.2,
"quality_gate_results": {
"total_tool_executions": 5,
"blocked_operations": 0,
"security_violations": 0
}
} 3. Reliability (Automatic Recovery)
V3 automatically retries transient API errors with exponential backoff:
# V2: Immediate failure on any error
try:
result = wrapper.execute(prompt)
except Exception:
# Task fails, user must retry manually
# V3: Automatic retry for transient errors
TRANSIENT_ERRORS = [
"rate_limit", # 429
"timeout",
"connection",
"overloaded",
"503", # Service unavailable
]
# Retries with: 1s, 2s, 4s, 8s... up to max_delay Configure retry behavior:
sugar:
agent:
max_retries: 3
retry_base_delay: 1.0 # seconds
retry_max_delay: 30.0 # max backoff Comparison Summary
| Aspect | V2 (ClaudeWrapper) | V3 (SugarAgent) |
|---|---|---|
| Execution model | Subprocess spawn | Native SDK call |
| Startup overhead | 300-500ms | ~27ms |
| Memory overhead | 50-100MB | <1MB |
| Tool tracking | Regex parsing | Structured stream |
| Security gates | None | PreToolUse hooks |
| Retry logic | None | Exponential backoff |
| File tracking | Pattern matching | Hook-based |
| Session history | Partial | Full audit trail |
When to Use Legacy Mode
V3 maintains backwards compatibility. Use legacy mode if:
- You have custom integrations depending on subprocess behavior
- You need to debug SDK vs CLI differences
- You're migrating gradually
sugar:
claude:
executor: legacy # Use V2 subprocess wrapper Note: Legacy mode is deprecated and will be removed in a future release.
Running the Benchmarks
To run these benchmarks yourself:
# Clone the Sugar repository
git clone https://github.com/roboticforce/sugar.git
cd sugar
# Install dependencies
pip install -e .
# Run benchmarks
python scripts/benchmark_v3.py Next Steps
- Agent SDK - Configure the new SDK executor
- Migration Guide - Upgrade from V2
- Configuration - Full config reference