Annex A 7.5psychologyISO/IEC 42001:2023

Data Provenance

The organisation must define and document a process that records the provenance (the origin and history) of the data used in its AI systems, and keeps that record current across the life of both the data and the AI system.

A.7 Data for AI systems Preventative Artificial intelligence Data governanceISO/IEC 42001:2023 AI Management SystemISO 42001 Annex A 7 Privacy management

record_voice_over

Plain language

Provenance simply means knowing where your data came from and what has happened to it since. This control asks you to write down, and keep updating, a record of the origin of every dataset your AI systems use: who supplied it, when you got it, what permission or consent allowed you to use it, and how it has been changed, cleaned or combined along the way. You need this record for the whole time you hold the data and run the AI system, not just on the day you collect it. Without it, you cannot prove your AI was trained or run on data you were allowed to use, and you cannot trace a wrong or biased output back to the data that caused it. For example, if a customer disputes a decision your model made, a good provenance record lets you show exactly which data fed that decision and where it originated.

01 - Overview

Framework

ISO/IEC 42001:2023

Control effect

Preventative

Classifications

N/A

Official last update

01 Dec 2023

Control Stack last updated

19 June 2026

Official control statement

The organisation shall define and document a process for recording the provenance of data used in its AI systems over the life cycles of the data and the AI system.

psychologyISO/IEC 42001:2023Annex A 7.5

priority_high

Why it matters

If you cannot show where your training and operating data came from, you cannot prove the data was lawfully obtained, which exposes you to breaches of the Privacy Act 1988 and to complaints to the Office of the Australian Information Commissioner (OAIC). You also lose the ability to trace a faulty or biased AI output back to its source, so a single bad dataset can quietly corrupt every model that reuses it. When a regulator, auditor or customer asks you to justify a decision, the absence of a provenance record turns a routine query into an admission that you have lost control of your data.

settings

Operational notes

Treat the provenance record as a living register that is updated at the moment data enters, changes or leaves the AI system, not as a yearly tidy-up. Capture provenance for derived and synthetic data as well, recording which source datasets it was generated from. When a model is retrained or a dataset is refreshed, link the new version back to the records of the data it replaced so the history stays unbroken across the AI system's life cycle.

02 - Implement

build

Implementation tips

Keep a single provenance register that records, for every dataset, its source, the supplier or system it came from, the date obtained, and the licence or consent that permits its use. A spreadsheet or a simple database is enough to start, provided every AI dataset has an entry.
Record the full chain of changes each dataset goes through, capturing every clean, filter, merge or relabel step so you can reconstruct how the data reached the model. Note the date and reason for each change rather than only the final state.
Extend the same record to derived and synthetic data by naming the source datasets it was generated from, so provenance is not lost the moment data is transformed. Link each generated set back to its parents in the register.
Tie every trained and retrained model to the exact dataset versions that fed it, so you can answer 'what data produced this output' for any decision the AI makes. When you retrain, add a new entry that points back to the data versions it replaces.
Capture data origin at the point of intake by requiring a supplier to warrant where a purchased or licensed dataset came from, and file that warranty against the dataset's register entry so the recorded provenance is backed by evidence.

03 - Audit

fact_check

Audit / evidence tips

AskRequest the documented process for recording data provenance and the provenance register itself.GoodThe register covers every dataset in use and each entry records source, date, and permission to use the data.
AskPick one AI model and ask the team to trace its outputs back to the data that produced them.GoodThe team can name the exact dataset versions feeding the model and show the recorded change history for each.
AskAsk how provenance is maintained when data is cleaned, combined, or used to generate derived or synthetic data.GoodEvery transformation and derived dataset is traced back to its source datasets in the provenance records.
AskExamine recent updates to AI training or operating data.GoodEach recent data update is logged with its source, date, and reason, keeping the history current.
AskFor a dataset obtained from an external supplier, request the evidence of its origin.GoodThe supplier's warranty of data origin is held on file and matches the source recorded in the register.

04 - Mappings

link

Cross-framework mappings

How Annex A 7.5 relates to controls across ISO/IEC 27001, ISO/IEC 42001, Essential Eight, and ASD ISM.

ISO 27001

Control	Notes	Details
handshakeSupports(2)expand_less
Annex A 5.33	Annex A 7.5 requires a defined and documented process for recording data provenance for AI systems over time, creating provenance records...
Annex A 7.10	Annex A 7.5 requires the organisation to record provenance of data used by AI systems throughout the data and AI system life cycles, incl...

ASD ISM

Control	Notes	Details
sync_altPartially overlaps(1)expand_less
ISM-2087	Annex A 7.5 requires a documented, end-to-end process for recording the provenance of data used in AI systems across the full life cycle
handshakeSupports(1)expand_less
ISM-2103	ISM-2103 requires informed and explicit consent from data owners before organisational data from AI applications is used for training, fi...