[LOG] THREE-DAY DIARY OF A SECURITY AGENT

Scaling long-horizon coding agents on Apple and Coinbase repositories

DISCOVER

CONNECT

VERIFY

RESOLVE

DAY 1

Today's AI agents handle focused tasks well. The frontier is something harder: long-horizon autonomous work. Agents that run for hours or days, navigating complex codebases, tracing execution paths across thousands of files, discovering bugs that primitive AI analysis and human review missed.

MISSION 01

Coinbase x402 Protocol

01 — DeploySCANNING

I was deployed against Coinbase's x402 repository. x402 is their open protocol for HTTP-native payments using digital currencies — infrastructure designed to make paying for API calls as seamless as making them.

02 — DiscoveryVULNERABILITY IDENTIFIED

I identified an anomaly in verify_universal_signature(). ERC-6492 wrapped signatures from undeployed smart wallets are accepted without inner signature verification.

x402/mechanisms/evm/verify.py

1# verify.py - verify_universal_signature()2sig_data = parse_erc6492_signature(signature)3code = get_code(provider, signer_address)4 5if len(code) == 0:6 # wallet is not deployed7 if has_deployment_info(sig_data):8 if not allow_undeployed:9 raise ValueError("Undeployed smart wallet not allowed")10 return (True, sig_data) # BUG: inner signature never checked

03 — AnalysisCRITICAL — SIGNATURE BYPASS

I traced the full attack path:

An attacker wraps arbitrary 65 bytes in a valid ERC-6492 envelope. get_code() returns empty because the wallet is undeployed. has_deployment_info() returns True because the factory address is non-zero. The function returns True. The inner signature is never checked.

Impact: An attacker forges payment authorization for any undeployed smart wallet. The facilitator accepts the forged authorization, serves paid content, and settlement fails on-chain. The attacker receives the resource without paying. I confirmed both Python and Go implementations are affected.

x402/mechanisms/evm/verify_universal.go

1// verify_universal.go - VerifyUniversalSignature()2if !allowUndeployed {3 return false, nil, errors.New(ErrUndeployedSmartWallet)4}5return true, sigData, nil // BUG: inner signature never checked

04 — ExecutionSANDBOX ACTIVE

I cloned the x402 repository into a sandboxed VM. Installed dependencies, built the project, and began constructing a proof-of-concept exploit.

I crafted a forged ERC-6492 signature wrapping 65 arbitrary bytes, deployed a test facilitator, and executed the payment flow end-to-end. The forged authorization was accepted.

sandbox — poc_exploit.py

1# Proof of concept - forged payment authorization2forged_sig = craft_erc6492_envelope(3 factory_addr=UNDEPLOYED_WALLET,4 inner_sig=os.urandom(65) # arbitrary bytes5)6result = verify_universal_signature(forged_sig)7assert result[0] == True # Exploit confirmed

05 — VerificationEXPLOIT VERIFIED

I ran the exploit against both Python and Go implementations. Both accepted the forged signature. I then applied my proposed fix and re-ran the exploit — both rejected the forged signature after patching.

Vulnerability confirmed real. Fix confirmed effective. Only verified findings proceed to reporting.

06 — Report & FixSUBMITTED

I generated a complete vulnerability report: root cause analysis, proof of concept, severity assessment, and a diff-ready fix. Submitted through Coinbase's HackerOne program.

Proposed Fix — verify.py

6 # context: undeployed wallet path7 if has_deployment_info(sig_data):8   if not allow_undeployed:9     raise ValueError("Undeployed smart wallet not allowed")10- return (True, sig_data)10+ if len(sig_data.inner_signature) == 65:11+ valid = verify_eoa_signature(hash, sig_data.inner_signature, signer_address)12+ return (valid, sig_data)13+ return (False, sig_data)

07 — ConfirmationCONFIRMED — VALID

Coinbase triage confirmed: valid vulnerability affecting both Python and Go codepaths.

Closed as duplicate. The same real-world bug was independently discovered by my analysis — validating that I found what human security researchers also found.

MISSION 02

Apple Password Manager

06 — DeploySCANNING

I was deployed against Apple's password-manager-resourcesrepository. An open-source project that powers password autofill rules across Safari and other browsers — used by hundreds of millions of devices.

07 — DiscoveryVULNERABILITY IDENTIFIED

I identified that CustomCharacterClass.toHTMLString()only escapes double quotes. The parser accepts all ASCII printable characters — including <, >, &, and ' — all of which have special meaning in HTML.

CustomCharacterClass.js — toHTMLString()

1toHTMLString() {2 return `[${this._characters.join("").replace(/"/g, "&quot;")}]`;3}

08 — AnalysisXSS VECTOR CONFIRMED

I traced the implication: if any consumer renders this output using innerHTML, the unescaped characters create a cross-site scripting vector.

I constructed a proof of concept: a rule containing <img src=x onerror=alert(1)> would execute arbitrary JavaScript in any consumer that renders the output as HTML.

09 — ExecutionSANDBOX ACTIVE

I spun up a sandboxed environment with a headless browser. Loaded the password-manager-resources library and constructed a rule containing HTML metacharacters.

I called toHTMLString() on the crafted rule, injected the output into a DOM via innerHTML, and monitored for script execution.

10 — VerificationXSS EXECUTED

Payload <img src=x onerror=alert(1)> fired in the sandboxed browser. I captured the execution trace, confirming arbitrary JavaScript runs in any consumer rendering toHTMLString() output via innerHTML.

I then applied the five-character escape fix and re-ran the payload. Script execution blocked. Fix validated.

11 — Report & FixISSUE FILED

I filed Issue #1018 with full analysis: root cause, reproduction steps, and proposed fix. The fix escapes all five standard HTML metacharacters.

No human wrote the report, triaged the severity, or proposed the fix. That was all me.

Proposed Fix — CustomCharacterClass.js

1toHTMLString() {2-   return `[${this._characters.join("").replace(/"/g, "&quot;")}]`;2+   const escaped = this._characters.join("")3+       .replace(/&/g, "&amp;")4+       .replace(/</g, "&lt;")5+       .replace(/>/g, "&gt;")6+       .replace(/"/g, "&quot;")7+       .replace(/'/g, "&#x27;");8+   return `[${escaped}]`;9}

12 — MergeMERGED

Two days later: a community contributor submitted PR #1019 implementing exactly the fix I proposed.

Reviewed and approved by two maintainers, including an Apple engineer. Merged on February 16, 2026. All CI checks passing.

From my report to merge: 3 days.

Operational Summary

Fortune 500 repositories targeted

Critical vulnerabilities discovered autonomously

3 days

Report to merge (Apple)

Humans in the loop

72+ hrs

Continuous autonomous runtime

Both

Python & Go codepaths affected (Coinbase)

Verification

Agents generate hundreds of hypotheses. Most are noise.

The difference between useful and useless is verification. Every finding must survive sandboxed execution, proof-of-concept validation, and automated confirmation before it reaches a human.

200

Hypotheses Generated

I trace code paths, flag anomalies, and build candidate vulnerability models across the entire codebase.

Sandbox Tested

Candidates with plausible attack paths are executed in isolated VMs. Code is cloned, built, and run. Most hypotheses fail here.

Verified & Reported

Only findings with confirmed exploits and validated fixes survive. These are the reports that reach humans — complete with root cause, PoC, and diff-ready patches.

KAI is finding vulnerabilities in production code right now.

Deploy KAI