AI Code Provenance: The Five Questions an Auditor Will Ask
A practical git-trailer spec and retention table for proving AI code provenance during an audit or post-incident review. What to capture, what to keep, and how long.
An examiner sits across from you. They have a printout of forty-seven commits from your main branch over the last ninety days. They are not asking whether your code is good. They are asking a simpler question: for each of these commits, was AI involved, and if so, can you prove it?
Most engineering organizations cannot answer this today. Commits carry an author name and a timestamp. They do not carry provenance. The distinction between “a developer wrote this” and “a developer prompted an AI and accepted the output” is invisible in git — which means it is invisible to you and to anyone auditing you.
This post is the practical answer to that scenario. Not a governance framework. Not a risk assessment. A concrete spec: what to put in your commits, how long to keep it, and how to query it when someone asks. If you already know why provenance matters — see the compliance requirements guide for that — skip to the trailer spec below.
The Five Questions
When an auditor, incident investigator, or regulator reviews AI-generated code, they ask variations of the same five questions. Your provenance system exists to answer them.
1. Was AI used in generating this code? A yes or no for each commit. Nothing fancy.
2. Which tool and model? Claude Code, Cursor, Copilot, Codex, something else. Model family and, where possible, version. Different tools have different failure modes; an auditor needs to know which applies.
3. Who reviewed the output before it was committed? Not the git author necessarily. The human who read the AI’s output and decided to keep it. In many workflows these are the same person. In others — a senior engineer reviewing a junior’s PR — they are not.
4. What was the AI asked to do? Not the verbatim prompt. A one-line summary: “add rate limiting to the login endpoint.” Enough to reconstruct intent six months later without exposing any sensitive string the prompt might have contained.
5. How long have you kept this metadata, and where is it stored? Most regulated industries require a specific retention period for code-change records. The metadata is useless if it was garbage-collected two weeks after the commit.
Everything else follows from these five.
The Minimum Trailer Spec
The cleanest way to capture provenance is with git trailers — structured lines at the end of a commit message that are machine-parseable and survive every rebase, cherry-pick, and merge.
Minimum viable trailers, one per field:
feat: add rate limiting to /auth/login
AI-Assisted: true
AI-Tool: claude-code
AI-Model: claude-sonnet-4-5
AI-Prompt-Summary: Add rate limiting middleware for the login endpoint, 5 attempts per 15 minutes per IP
AI-Review-Confidence: high
Reviewed-by: dana@company.com
Six fields. Every one maps to an auditor question:
| Field | Question it answers |
|---|---|
AI-Assisted | Was AI used? |
AI-Tool | Which tool? |
AI-Model | Which model? |
AI-Prompt-Summary | What was AI asked to do? |
AI-Review-Confidence | Is this code the human would stake their name on? |
Reviewed-by | Who signs for it? |
AI-Review-Confidence takes one of three values: high (simple change, fully understood), medium (reviewed line by line, some ambiguity), low (accepted output under time pressure, deserves extra scrutiny if something breaks). The scale exists so that during a post-incident review, you can triage instantly — pull every low confidence AI commit touched by the blast radius and review those first.
For commits where AI was not involved, you set AI-Assisted: false and omit the rest. Yes, every commit. A provenance log with gaps is a provenance log you cannot trust. The only way to prove a commit was hand-written is for the absence of AI to be recorded explicitly.
What Not to Capture
Provenance is about metadata, not surveillance. Three things specifically do not belong in the trailer:
The full prompt. Prompts routinely contain pasted code, error messages, and occasionally secrets. Storing them creates a secondary leak surface. A one-line summary is enough for forensic reconstruction. If regulation demands the full prompt — some financial and healthcare frameworks do — store it in a separate system with access controls that match the sensitivity, and reference it by ID in the trailer.
The generated code itself. It is already in git. Duplicating it in metadata doubles your attack surface and proves nothing that git does not already prove.
Per-keystroke telemetry. Any framework that captures every character a developer typed alongside the AI’s suggestions is both disproportionate and, in most jurisdictions, a labor-law minefield. Provenance operates at the commit level, not the keystroke level.
Retention Table
How long to keep the metadata depends on why you need it. The minimum answer for most regulated contexts:
| Driver | What to keep | Retention | Why |
|---|---|---|---|
| SOC 2 Trust Services | All trailer fields | 12 months minimum, 7 years recommended | Change management evidence for the audit period |
| HIPAA (§164.312(b)) | All trailer fields + reviewer identity | 6 years from creation or last use | Audit controls for PHI-touching systems |
| PCI-DSS 12.10.5 | All trailer fields + incident correlation | 12 months online, 3 years total | Incident response and change control |
| EU AI Act (high-risk systems) | Trailer + prompt summary + model version | 10 years | Art. 19 record-keeping for high-risk AI output |
| Internal post-incident debugging | Trailer only | As long as the code is in production | Correlating defects to provenance |
None of these require separate storage. Git itself is durable, timestamped, and cryptographically verifiable. If your retention is “as long as the commit exists,” git is the retention system. The only reason to copy the data into a dedicated audit store is if you need structured queries (which commits used model X in Q2?) faster than git log can provide.
Enforcing the Spec
A convention nobody follows is worse than no convention, because it gives you false confidence. Two enforcement points matter:
A commit-msg hook that rejects any commit missing AI-Assisted:. This catches 90% of drift at the developer’s desk, before the commit ever leaves the machine. The hook does not need to judge whether the value is correct; it just requires the field to be present. Honesty is the developer’s responsibility; presence is the hook’s.
A CI check that parses trailers on every PR and fails the build if any commit in the series is missing provenance fields. This catches the other 10% — commits authored on a laptop without the hook installed, or commits cherry-picked from an unverified branch.
Neither check needs to detect AI usage from the code itself. Heuristic detection of AI-generated code is unreliable and creates adversarial incentives. Self-reporting, enforced at commit and CI, is the pragmatic floor. The compliance value is in the audit trail; the trail is valid if the fields are present.
The Post-Incident Playbook
When the 2 AM page comes and a commit is suspect, provenance turns a forensic exercise into a query.
- Narrow the blast radius.
git log --grep="AI-Assisted: true" -- path/to/affected/files— every AI-involved commit that touched the affected code. - Pull the low-confidence commits first.
git log --grep="AI-Review-Confidence: low"filtered by the list above. These are the statistically likelier candidates. - Surface the reviewer and the prompt summary. For each candidate, you now have a name to call and a one-line reminder of what the AI was asked to produce. That is the conversation that used to be impossible.
Without provenance, the on-call engineer treats every commit the same. With it, they prioritize the ones with the highest prior probability of being the cause. Over enough incidents, that difference compounds into a meaningful reduction in mean time to resolution.
Track these metrics automatically with LobsterOne
Get Started FreeWhere This Sits Relative to the Rest of Your Program
Provenance is the evidentiary floor. It is not a governance policy — see the governance framework for the policy language your legal team will want. It is not a risk assessment — see the risk assessment template for the steering committee deliverable. And it is not a compliance mapping — see the compliance requirements guide for the regulation-by-regulation breakdown.
What provenance does is make all three of those documents actionable. A governance policy that says “AI-generated code must be reviewed” is unenforceable without a field that records whether review happened. A risk assessment that assigns probability to AI failure modes is uncalibrated without a dataset that links AI involvement to defect rates. A compliance mapping that claims SOC 2 change-management coverage is unfalsifiable without artifacts the auditor can grep.
The five questions at the top of this post are not hypothetical. Someone is going to ask them about your code, sooner than you think. The six-line trailer spec above is the smallest implementation that lets you answer. Adopt it this week; the retention clock starts from the first commit that carries it.
Pierre Sauvignon
Founder
Founder of LobsterOne. Building tools that make AI-assisted development visible, measurable, and fun.
Related Articles

AI Coding Governance: A Policy Document Template
Fill-in-the-blank policy language for an internal AI coding tools standard. Scope, acceptable use, approval, review, enforcement — copy into your policy library and adjust the bracketed placeholders.

AI Coding Compliance: A Regulation-by-Regulation Mapping
What SOC 2, HIPAA, PCI-DSS, GDPR, and the EU AI Act actually require when your code is AI-generated — mapped to specific controls, evidence artifacts, and audit-time answers.

AI Code CI/CD Gating: A Decision Tree for Blocking, Flagging, and Passing
When to block an AI-generated commit at merge, when to flag it for extra review, and when to let it through. A concrete gating tree for staff engineers responsible for production safety.