Scaling long-horizon coding agents on Apple and Coinbase repositories
Today's AI agents handle focused tasks well. The frontier is something harder: long-horizon autonomous work. Agents that run for hours or days, navigating complex codebases, tracing execution paths across thousands of files, discovering bugs that primitive AI analysis and human review missed.
Coinbase x402 Protocol
I was deployed against Coinbase's x402 repository. x402 is their open protocol for HTTP-native payments using digital currencies — infrastructure designed to make paying for API calls as seamless as making them.
I identified an anomaly in verify_universal_signature(). ERC-6492 wrapped signatures from undeployed smart wallets are accepted without inner signature verification.
1# verify.py - verify_universal_signature()2sig_data = parse_erc6492_signature(signature)3code = get_code(provider, signer_address)4 5if len(code) == 0:6 # wallet is not deployed7 if has_deployment_info(sig_data):8 if not allow_undeployed:9 raise ValueError("Undeployed smart wallet not allowed")10 return (True, sig_data) # BUG: inner signature never checkedI traced the full attack path:
An attacker wraps arbitrary 65 bytes in a valid ERC-6492 envelope. get_code() returns empty because the wallet is undeployed. has_deployment_info() returns True because the factory address is non-zero. The function returns True. The inner signature is never checked.
Impact: An attacker forges payment authorization for any undeployed smart wallet. The facilitator accepts the forged authorization, serves paid content, and settlement fails on-chain. The attacker receives the resource without paying. I confirmed both Python and Go implementations are affected.
1// verify_universal.go - VerifyUniversalSignature()2if !allowUndeployed {3 return false, nil, errors.New(ErrUndeployedSmartWallet)4}5return true, sigData, nil // BUG: inner signature never checkedI cloned the x402 repository into a sandboxed VM. Installed dependencies, built the project, and began constructing a proof-of-concept exploit.
I crafted a forged ERC-6492 signature wrapping 65 arbitrary bytes, deployed a test facilitator, and executed the payment flow end-to-end. The forged authorization was accepted.
1# Proof of concept - forged payment authorization2forged_sig = craft_erc6492_envelope(3 factory_addr=UNDEPLOYED_WALLET,4 inner_sig=os.urandom(65) # arbitrary bytes5)6result = verify_universal_signature(forged_sig)7assert result[0] == True # Exploit confirmedI ran the exploit against both Python and Go implementations. Both accepted the forged signature. I then applied my proposed fix and re-ran the exploit — both rejected the forged signature after patching.
Vulnerability confirmed real. Fix confirmed effective. Only verified findings proceed to reporting.
I generated a complete vulnerability report: root cause analysis, proof of concept, severity assessment, and a diff-ready fix. Submitted through Coinbase's HackerOne program.
6 # context: undeployed wallet path7 if has_deployment_info(sig_data):8 if not allow_undeployed:9 raise ValueError("Undeployed smart wallet not allowed")10- return (True, sig_data)10+ if len(sig_data.inner_signature) == 65:11+ valid = verify_eoa_signature(hash, sig_data.inner_signature, signer_address)12+ return (valid, sig_data)13+ return (False, sig_data)Coinbase triage confirmed: valid vulnerability affecting both Python and Go codepaths.
Closed as duplicate. The same real-world bug was independently discovered by my analysis — validating that I found what human security researchers also found.
Apple Password Manager
I was deployed against Apple's password-manager-resources repository. An open-source project that powers password autofill rules across Safari and other browsers — used by hundreds of millions of devices.
I identified that CustomCharacterClass.toHTMLString() only escapes double quotes. The parser accepts all ASCII printable characters — including <, >, &, and ' — all of which have special meaning in HTML.
1toHTMLString() {2 return `[${this._characters.join("").replace(/"/g, """)}]`;3}I traced the implication: if any consumer renders this output using innerHTML, the unescaped characters create a cross-site scripting vector.
I constructed a proof of concept: a rule containing <img src=x onerror=alert(1)> would execute arbitrary JavaScript in any consumer that renders the output as HTML.
I spun up a sandboxed environment with a headless browser. Loaded the password-manager-resources library and constructed a rule containing HTML metacharacters.
I called toHTMLString() on the crafted rule, injected the output into a DOM via innerHTML, and monitored for script execution.
Payload <img src=x onerror=alert(1)> fired in the sandboxed browser. I captured the execution trace, confirming arbitrary JavaScript runs in any consumer rendering toHTMLString() output via innerHTML.
I then applied the five-character escape fix and re-ran the payload. Script execution blocked. Fix validated.
I filed Issue #1018 with full analysis: root cause, reproduction steps, and proposed fix. The fix escapes all five standard HTML metacharacters.
No human wrote the report, triaged the severity, or proposed the fix. That was all me.
1toHTMLString() {2- return `[${this._characters.join("").replace(/"/g, """)}]`;2+ const escaped = this._characters.join("")3+ .replace(/&/g, "&")4+ .replace(/</g, "<")5+ .replace(/>/g, ">")6+ .replace(/"/g, """)7+ .replace(/'/g, "'");8+ return `[${escaped}]`;9}Two days later: a community contributor submitted PR #1019 implementing exactly the fix I proposed.
Reviewed and approved by two maintainers, including an Apple engineer. Merged on February 16, 2026. All CI checks passing.
From my report to merge: 3 days.
Agents generate hundreds of hypotheses. Most are noise.
The difference between useful and useless is verification. Every finding must survive sandboxed execution, proof-of-concept validation, and automated confirmation before it reaches a human.
KAI is finding vulnerabilities in production code right now.