Why Sugar V3?

An honest look at what V3 improves, backed by real benchmarks. No marketing hype.

The Short Version

Sugar V3 replaces subprocess calls to Claude CLI with native Agent SDK integration. This provides:

Security gates - Block dangerous operations before they execute
Observability - Structured tool tracking instead of regex parsing
Reliability - Automatic retry on transient API errors

What V3 does NOT improve: Claude's processing speed, token costs, or task quality. Those depend on Claude, not the executor.

Architecture Comparison

V2: Subprocess Model

Python Process
    |
    +-- spawn subprocess --> Claude CLI Process
                                |
                                +-- stdin: prompt text
                                +-- stdout/stderr: raw output
                                |
    <-- regex parsing --------- output text

V3: Native SDK Model

Python Process
    |
    +-- import claude_agent_sdk
    |
    +-- async generator --> streaming messages
    |                          |
    +-- typed objects <------- tool uses / text / results
    |
    +-- hooks --------------> quality gates
    |
    +-- direct access ------> files / actions / metrics

Real Benchmarks

These numbers come from running scripts/benchmark_v3.py in the Sugar repository.

Startup Overhead

Metric	V2 (Subprocess)	V3 (SDK)	Improvement
Initialization time	300-500ms	~27ms	10-15x faster
Memory overhead	50-100MB (separate process)	<1MB (shared process)	50-100x less

Per-Operation Latency

Operation	V3 Latency
Security check (per tool use)	0.002ms
Response serialization	<0.001ms
Transient error detection	<0.001ms

Context: What This Actually Means

A typical task breakdown:

V2: 400ms overhead + 60s Claude API = 60.4s total
V3:  27ms overhead + 60s Claude API = 60.03s total

Improvement: ~370ms (less than 1% of total time)

The overhead improvement is real but small compared to Claude's processing time. The real value is in the new capabilities.

What V3 Actually Provides

1. Security Gates (New Capability)

V3 can block dangerous operations before execution. V2 had no mechanism for this.

# V3 blocks access to protected files
quality_gates:
  protected_files:
    - ".env"
    - "*.pem"
    - "credentials.json"
    - "secrets/*"

# V3 blocks dangerous commands
  blocked_commands:
    - "rm -rf /"
    - "sudo"
    - "chmod -R 777 /"
    - "> /dev/sda"

When a blocked operation is attempted:

# PreToolUse hook intercepts and denies
{
  "permissionDecision": "deny",
  "permissionDecisionReason": "Protected file access blocked: .env"
}

2. Observability (Improved Accuracy)

Capability	V2	V3
Tool use tracking	Regex parsing of stdout	Direct from message stream
File modifications	Pattern matching (unreliable)	Hook-based (accurate)
Command execution	Inferred from output	Captured in real-time
Execution history	Limited session state	Full audit trail

V3 response includes structured data:

{
  "success": true,
  "tool_uses": [
    {"tool": "Write", "input": {"file_path": "/src/main.py"}},
    {"tool": "Bash", "input": {"command": "pytest"}}
  ],
  "files_modified": ["/src/main.py", "/tests/test_main.py"],
  "execution_time": 45.2,
  "quality_gate_results": {
    "total_tool_executions": 5,
    "blocked_operations": 0,
    "security_violations": 0
  }
}

3. Reliability (Automatic Recovery)

V3 automatically retries transient API errors with exponential backoff:

# V2: Immediate failure on any error
try:
    result = wrapper.execute(prompt)
except Exception:
    # Task fails, user must retry manually

# V3: Automatic retry for transient errors
TRANSIENT_ERRORS = [
  "rate_limit",    # 429
  "timeout",
  "connection",
  "overloaded",
  "503",           # Service unavailable
]

# Retries with: 1s, 2s, 4s, 8s... up to max_delay

Configure retry behavior:

sugar:
  agent:
    max_retries: 3
    retry_base_delay: 1.0   # seconds
    retry_max_delay: 30.0   # max backoff

Comparison Summary

Aspect	V2 (ClaudeWrapper)	V3 (SugarAgent)
Execution model	Subprocess spawn	Native SDK call
Startup overhead	300-500ms	~27ms
Memory overhead	50-100MB	<1MB
Tool tracking	Regex parsing	Structured stream
Security gates	None	PreToolUse hooks
Retry logic	None	Exponential backoff
File tracking	Pattern matching	Hook-based
Session history	Partial	Full audit trail

When to Use Legacy Mode

V3 maintains backwards compatibility. Use legacy mode if:

You have custom integrations depending on subprocess behavior
You need to debug SDK vs CLI differences
You're migrating gradually

sugar:
  claude:
    executor: legacy  # Use V2 subprocess wrapper

Note: Legacy mode is deprecated and will be removed in a future release.

Running the Benchmarks

To run these benchmarks yourself:

# Clone the Sugar repository
git clone https://github.com/roboticforce/sugar.git
cd sugar

# Install dependencies
pip install -e .

# Run benchmarks
python scripts/benchmark_v3.py

Next Steps

Agent SDK - Configure the new SDK executor
Migration Guide - Upgrade from V2
Configuration - Full config reference