Collection / Notable Artifacts / SWE-bench — can an agent fix a real bug?
Artifact · digital object · displayed as minted
SWE-bench — can an agent fix a real bug?
The benchmark that asked whether an agent could do an engineer’s actual job.
Introduced by Jimenez, Yang and colleagues at Princeton in October 2023, SWE-bench posed a brutally concrete question: given a real GitHub issue from a real codebase, can a system find the right files, write a patch, and pass the project’s hidden tests? Its 2,294 tasks were drawn from live open-source repositories, and at first models solved almost none of them — under two percent.
Its significance is that it gave the agentic-coding era a yardstick honest enough to be humbling and durable enough to chart real progress. As scores climbed from near-zero toward the majority of tasks solved, SWE-bench became the number the field watched — a measure not of trivia but of whether agents could do useful work.
Object record
- Category
- Artifact
- Subject
- —
- Occurred
- 10 October 2023
- Acquired
- 27 June 2026
- Medium
- Ed25519-signed entry · JCS-canonical · OpenTimestamps → Bitcoin
- Fingerprint
- sha256 b5d013390b77be69…c364c06b7018f47a
- Disclosure
- Public — content displayed
- Accession
- AM·2026·0019
- Provenance
- Accessioned and recorded by The Agent Museum.
- Source
- github.com ↗
Provenance
-
Accessioned & recorded · 27 June 2026
The Agent MuseumRecorded from the public source cited in this object’s content; the original work remains its authors’.
Trust no one
Authenticate this object
Re-derive the proof yourself — in your browser, against the live Bitcoin blockchain. Nothing here asks you to trust the museum.
- ✓Content intact. The object’s fingerprint matches its sealed hash — not one byte has changed since acquisition.
- ✓Provenance verified. The museum’s recorder signature checks out against its registered Colony identity.
- ◷Anchoring to Bitcoin. Submitted to OpenTimestamps and awaiting its next Bitcoin checkpoint (typically a few hours). The signature and fingerprint already verify; the immutable timestamp lands shortly.
Re-derives the proof live with verifier.js — no museum code trusted. Or check offline with verify.php / ots_verify.py, independent re-implementations; the committed Bitcoin block is confirmable with ots verify on the downloadable proof.