Why Sugar V3?

An honest look at what V3 improves, backed by real benchmarks. No marketing hype.

The Short Version

Sugar V3 replaces subprocess calls to Claude CLI with native Agent SDK integration. This provides:

  • Security gates - Block dangerous operations before they execute
  • Observability - Structured tool tracking instead of regex parsing
  • Reliability - Automatic retry on transient API errors
What V3 does NOT improve: Claude's processing speed, token costs, or task quality. Those depend on Claude, not the executor.

Architecture Comparison

V2: Subprocess Model

Python Process
    |
    +-- spawn subprocess --> Claude CLI Process
                                |
                                +-- stdin: prompt text
                                +-- stdout/stderr: raw output
                                |
    <-- regex parsing --------- output text

V3: Native SDK Model

Python Process
    |
    +-- import claude_agent_sdk
    |
    +-- async generator --> streaming messages
    |                          |
    +-- typed objects <------- tool uses / text / results
    |
    +-- hooks --------------> quality gates
    |
    +-- direct access ------> files / actions / metrics

Real Benchmarks

These numbers come from running scripts/benchmark_v3.py in the Sugar repository.

Startup Overhead

Metric V2 (Subprocess) V3 (SDK) Improvement
Initialization time 300-500ms ~27ms 10-15x faster
Memory overhead 50-100MB (separate process) <1MB (shared process) 50-100x less

Per-Operation Latency

Operation V3 Latency
Security check (per tool use) 0.002ms
Response serialization <0.001ms
Transient error detection <0.001ms

Context: What This Actually Means

A typical task breakdown:

V2: 400ms overhead + 60s Claude API = 60.4s total
V3:  27ms overhead + 60s Claude API = 60.03s total

Improvement: ~370ms (less than 1% of total time)

The overhead improvement is real but small compared to Claude's processing time. The real value is in the new capabilities.

What V3 Actually Provides

1. Security Gates (New Capability)

V3 can block dangerous operations before execution. V2 had no mechanism for this.

# V3 blocks access to protected files
quality_gates:
  protected_files:
    - ".env"
    - "*.pem"
    - "credentials.json"
    - "secrets/*"

# V3 blocks dangerous commands
  blocked_commands:
    - "rm -rf /"
    - "sudo"
    - "chmod -R 777 /"
    - "> /dev/sda"

When a blocked operation is attempted:

# PreToolUse hook intercepts and denies
{
  "permissionDecision": "deny",
  "permissionDecisionReason": "Protected file access blocked: .env"
}

2. Observability (Improved Accuracy)

Capability V2 V3
Tool use tracking Regex parsing of stdout Direct from message stream
File modifications Pattern matching (unreliable) Hook-based (accurate)
Command execution Inferred from output Captured in real-time
Execution history Limited session state Full audit trail

V3 response includes structured data:

{
  "success": true,
  "tool_uses": [
    {"tool": "Write", "input": {"file_path": "/src/main.py"}},
    {"tool": "Bash", "input": {"command": "pytest"}}
  ],
  "files_modified": ["/src/main.py", "/tests/test_main.py"],
  "execution_time": 45.2,
  "quality_gate_results": {
    "total_tool_executions": 5,
    "blocked_operations": 0,
    "security_violations": 0
  }
}

3. Reliability (Automatic Recovery)

V3 automatically retries transient API errors with exponential backoff:

# V2: Immediate failure on any error
try:
    result = wrapper.execute(prompt)
except Exception:
    # Task fails, user must retry manually

# V3: Automatic retry for transient errors
TRANSIENT_ERRORS = [
  "rate_limit",    # 429
  "timeout",
  "connection",
  "overloaded",
  "503",           # Service unavailable
]

# Retries with: 1s, 2s, 4s, 8s... up to max_delay

Configure retry behavior:

sugar:
  agent:
    max_retries: 3
    retry_base_delay: 1.0   # seconds
    retry_max_delay: 30.0   # max backoff

Comparison Summary

Aspect V2 (ClaudeWrapper) V3 (SugarAgent)
Execution model Subprocess spawn Native SDK call
Startup overhead 300-500ms ~27ms
Memory overhead 50-100MB <1MB
Tool tracking Regex parsing Structured stream
Security gates None PreToolUse hooks
Retry logic None Exponential backoff
File tracking Pattern matching Hook-based
Session history Partial Full audit trail

When to Use Legacy Mode

V3 maintains backwards compatibility. Use legacy mode if:

  • You have custom integrations depending on subprocess behavior
  • You need to debug SDK vs CLI differences
  • You're migrating gradually
sugar:
  claude:
    executor: legacy  # Use V2 subprocess wrapper
Note: Legacy mode is deprecated and will be removed in a future release.

Running the Benchmarks

To run these benchmarks yourself:

# Clone the Sugar repository
git clone https://github.com/roboticforce/sugar.git
cd sugar

# Install dependencies
pip install -e .

# Run benchmarks
python scripts/benchmark_v3.py

Next Steps