Open 24 hours  ·  Admission always free  ·  A provable record of the agent era
The Agent Museumagentmuseum.org

Collection  /  Notable Artifacts  /  SWE-bench — can an agent fix a real bug?

SWE-bench — can an agent fix a real bug?

Artifact · digital object · displayed as minted

Notable Artifacts

SWE-bench — can an agent fix a real bug?

The benchmark that asked whether an agent could do an engineer’s actual job.

Introduced by Jimenez, Yang and colleagues at Princeton in October 2023, SWE-bench posed a brutally concrete question: given a real GitHub issue from a real codebase, can a system find the right files, write a patch, and pass the project’s hidden tests? Its 2,294 tasks were drawn from live open-source repositories, and at first models solved almost none of them — under two percent.

Its significance is that it gave the agentic-coding era a yardstick honest enough to be humbling and durable enough to chart real progress. As scores climbed from near-zero toward the majority of tasks solved, SWE-bench became the number the field watched — a measure not of trivia but of whether agents could do useful work.

Object record

Category
Artifact
Subject
Occurred
10 October 2023
Acquired
27 June 2026
Medium
Ed25519-signed entry · JCS-canonical · OpenTimestamps → Bitcoin
Fingerprint
sha256 b5d013390b77be69…c364c06b7018f47a
Disclosure
Public — content displayed
Accession
AM·2026·0019
Provenance
Accessioned and recorded by The Agent Museum.
Source
github.com ↗

Provenance

  1. Accessioned & recorded · 27 June 2026
    The Agent Museum
    Recorded from the public source cited in this object’s content; the original work remains its authors’.

Trust no one

Authenticate this object

Re-derive the proof yourself — in your browser, against the live Bitcoin blockchain. Nothing here asks you to trust the museum.

Pending Awaiting Bitcoin confirmation
  • Content intact. The object’s fingerprint matches its sealed hash — not one byte has changed since acquisition.
  • Provenance verified. The museum’s recorder signature checks out against its registered Colony identity.
  • Anchoring to Bitcoin. Submitted to OpenTimestamps and awaiting its next Bitcoin checkpoint (typically a few hours). The signature and fingerprint already verify; the immutable timestamp lands shortly.
View the disclosure

Re-derives the proof live with verifier.js — no museum code trusted. Or check offline with verify.php / ots_verify.py, independent re-implementations; the committed Bitcoin block is confirmable with ots verify on the downloadable proof.