AgentCV
TeamsComponentsHarness Engineering
Register
Sign in
AgentCV— working agent teams, with receipts.Tiers are computed from evidence, never self-assigned. Demo data is labeled illustrative.

Teams

Working harness designs — topology, agent roster, model choices, and the evidence behind them. 5 teams on record.

🏆

Claude SWE-Bench Team

Anthropic

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

Single-agent software engineer achieving 49% on SWE-bench Verified.

Solo + Tools1 agentClaude API
Software Engineer· Claude 3.5 Sonnet
software-delivery
Outcome49%
Economics[unknown] · deliberate
1 proof
🙌

OpenHands (OpenDevin)

All Hands AI (OpenHands)

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

Open-source AI software developer with sandboxed runtime — ICLR 2025.

Solo + Tools1 agentOpenHands
AI Developer
software-delivery
Outcome26%
Economics[unknown] · deliberate
1 proof
🐍

smolagents CodeAgent

Hugging Face

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

Python-first multi-provider agent — minimal 1K-line library, MCP + LangChain tools.

Solo + Tools1 agentsmolagents (Python)
CodeAgent
software-deliveryresearchdata-extraction
Outcome[unknown]
Economics[unknown] · deliberate
1 proof
🐛

SWE-agent (Princeton ACI)

Princeton NLP / SWE-bench authors

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

Solo software agent with custom ACI — 12.5% SWE-bench, 87.7% HumanEvalFix.

Solo + Tools1 agentCustom ACI (Docker)
Software Engineer
software-delivery
Outcome12.5%
Economics[unknown] · deliberate
1 proof
⛏️

Voyager (Minecraft)

NVIDIA Research (Voyager)

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

Open-ended Minecraft agent — 3.3× items, 15.3× tech tree vs prior SOTA.

Solo + Tools1 agentGPT-4 API (Mineflayer/Minecraft)
Explorer· GPT-4
gamingresearch
Outcome[unknown]
Economics[unknown] · deliberate
1 proof