Ensuring Integrity of AI Model Training Data
Verify the source and integrity of data used to train AI models to prevent poisoning.
Plain language
This control is about making sure the data used to train AI models comes from a trustworthy source and hasn’t been tampered with. If someone messes with this data, they can teach the AI to make bad decisions, which could harm your business, clients, or students with wrong insights or faulty automation.
Framework
ASD Information Security Manual (ISM)
Control effect
Preventative
Classifications
NC, OS, P, S, TS
ISM last updated
Nov 2025
Control Stack last updated
19 Mar 2026
E8 maturity levels
N/A
Guideline
Guidelines for software developmentOfficial control statement
The source and integrity of training data for artificial intelligence models is verified.
Why it matters
If training data sources or integrity aren’t verified, poisoned or altered data can skew model outputs, causing unsafe decisions and loss of trust.
Operational notes
Maintain a verified data provenance record, validate hashes/signatures on datasets, and periodically re-check sources to detect tampering before retraining.
Implementation tips
- Business management should ensure that data sources are thoroughly vetted before use. This involves verifying the origin of the data and making sure it's provided by a reputable entity. Check reviews, perform background checks, and consult with your IT team to confirm legitimacy.
- The IT team should establish processes to regularly check data integrity. Use tools to test the data for anomalies and unexpected changes, ensuring it's consistent with what was initially received. This might involve automated scripts that alert you to alterations.
- Procurement staff should involve the legal team when contracts for data sources are made. Make sure there are clear terms about the data's origin and the actions taken if there’s any breach in these terms. Have a legal review to ensure the contract offers protections against data poisoning risks.
- System owners should coordinate with AI specialists to implement a trial run of the model using a small, controlled batch of the dataset. Observe how the model behaves and whether it performs as expected, helping to spot any data inconsistencies early on.
- Managers should maintain a log of all data received for training AI, noting details about its source, verification steps taken, and any known limitations. Regularly review this log with the IT team to ensure continuous integrity of AI model data.
Audit / evidence tips
-
Askdata source verification documents: Request records showing how each data source was vetted
GoodA detailed log signed by the reviewer showing a credible and approved source
-
Askdata integrity check logs: Request to see logs of ongoing integrity checks
GoodLogs that show regular, systematic tests and no unexplained discrepancies
-
Askcontract agreements with data providers: Request the agreements that outline provider responsibilities
GoodClear terms demonstrating robust protections and regular quality confirmations
-
Asktrial run results: Request documentation of AI model trial runs
GoodReports that show consistent performance without any unusual deviations
-
Askthe data management log: Request the maintained log of all data used in training
GoodComprehensive entries with clear verification notes and resolutions for any data limitations
Cross-framework mappings
How ISM-2087 relates to controls across ISO/IEC 27001, Essential Eight, and ASD ISM.
ISO 27001
| Control | Notes | Details |
|---|---|---|
| sync_alt Partially overlaps (2) expand_less | ||
| Annex A 5.21 | ISM-2087 requires the organisation to verify the source and integrity of training data used for AI models to prevent data poisoning | |
| Annex A 8.30 | ISM-2087 requires the organisation to verify the source and integrity of training data used to build AI models | |
These mappings show relationships between controls across frameworks. They do not imply full equivalence or certification.