Data Provenance
Organisations must have a process to document the source and usage of data in AI systems over time.
Plain language
This control means your business should keep track of where all the data used in your AI systems comes from and how it's used over time. It’s important because if you don’t know the history of your data, you might end up with AI that makes incorrect decisions or violates privacy laws, like recommending the wrong products to customers because the data is outdated or incorrect.
Framework
ISO/IEC 42001:2023
Control effect
Preventative
Classifications
N/A
Official last update
01 Dec 2023
Control Stack last updated
19 May 2026
Maturity levels
N/A
Official control statement
The organisation shall define and document a process for recording the provenance of data used in its AI systems over the life cycles of the data and the AI system.
Why it matters
If you don't track where your AI's data comes from, it might make bad choices - like recommending products based on incorrect data - leading to upset customers and legal issues.
Operational notes
Keep your data provenance log current - update it whenever you get new data or modify existing data, not just once a year.
Implementation tips
- The data steward should set up a simple system, like a spreadsheet or a basic database, to log where all the AI data comes from and any changes made to it. It could include details like the original source and how the data was altered, which can help quickly identify issues if something goes wrong.
- AI lead should train their team about the importance of data history (provenance) and its impact on AI system decisions. Use an example, like how outdated data caused wrong customer suggestions, to illustrate the benefits of knowing data history.
- Procurement should ensure that when purchasing data sets, all vendor contracts mandate data provenance information. They could include a clause requiring the vendor to provide a data origin report, so the business always knows where their AI training data starts and how reliable it is.
- The product owner should regularly verify that the AI model isn’t using outdated or invalid data. This can involve setting up alerts or checks in the system whenever there are updates or new data inputs to prevent unexpected AI behaviours.
- The head of risk should assess the potential risks of using unidentified data in AI systems. This involves creating a risk profile for data sources and understanding potential legal or ethical issues, ensuring any problems are caught early before they impact customers.
Audit / evidence tips
- AskRequest the organisation's data provenance log for AI systems. GoodThe log is up-to-date, with clear details on data origin and modification.
- AskAsk the AI development team's lead about their process for tracking data sources. GoodThe AI development lead provides a clear process that matches documented procedures.
- AskCheck procurement contracts for data sets used by AI. GoodContracts contain clear data provenance clauses signed by both parties.
- AskExamine recent changes in AI training data. GoodEach change in training data is documented, listing source and reason, in the system log.
- AskLook at the minutes from recent risk management meetings. GoodThe minutes show regular discussion of data-related risks in AI, with action items recorded.
Cross-framework mappings
How Annex A 7.5 relates to controls across ISO/IEC 27001, ISO/IEC 42001, Essential Eight, and ASD ISM.
ISO 27001
| Control | Notes | Details |
|---|---|---|
| handshake Supports (2) expand_less | ||
| Annex A 5.33 | Annex A 7.5 requires a defined and documented process for recording data provenance for AI systems over time, creating provenance records... | |
| Annex A 7.10 | Annex A 7.5 requires the organisation to record provenance of data used by AI systems throughout the data and AI system life cycles, incl... | |
ASD ISM
| Control | Notes | Details |
|---|---|---|
| sync_alt Partially overlaps (1) expand_less | ||
| ISM-2087 | Annex A 7.5 requires a documented, end-to-end process for recording the provenance of data used in AI systems across the full life cycle | |
| handshake Supports (1) expand_less | ||
| ISM-2103 | ISM-2103 requires informed and explicit consent from data owners before organisational data from AI applications is used for training, fi... | |
These mappings show relationships between controls across frameworks. They do not imply full equivalence or certification.
Want to implement this AI control?
Mindset Cyber runs PECB-accredited ISO/IEC 42001 training that maps directly to the AI controls in this library.