ISM-2087policyASD Information Security Manual (ISM)

Verify the Source and Integrity of AI Training Data

Check where the data used to train artificial intelligence (AI) models comes from and confirm it has not been tampered with before it is used.

Preventative NC OS PASD Information Security ManualGuidelines for software development Data governance Assurance Supplier agreements Artificial intelligence

record_voice_over

Plain language

Artificial intelligence (AI) models learn from large sets of data, and the quality of that data shapes how the model behaves. This control means you must know exactly where your training data came from (its source) and confirm it has not been changed, corrupted, or poisoned along the way (its integrity). If you train a model on data of unknown origin or data that someone has secretly altered, the model can make wrong, biased, or unsafe decisions, and you may not realise why.

01 - Overview

Framework

ASD Information Security Manual (ISM)

Control effect

Preventative

Classifications

NC, OS, P, S, TS

ISM last updated

Dec 2025

Control Stack last updated

18 June 2026

E8 maturity levels

N/A

Guideline

Guidelines for software development

Section

Artificial Intelligence Application Development

Topic

Artificial Intelligence Model Poisoning

Official control statement

The source and integrity of training data for AI models is verified.

policyASD Information Security Manual (ISM)ISM-2087

priority_high

Why it matters

If training data of unknown origin or that has been secretly altered is used, the AI model can produce wrong, biased, or manipulated outputs that the organisation trusts and acts on without realising the cause.

settings

Operational notes

Re-verify source and integrity every time a dataset is refreshed or re-supplied, not just on first use, and keep provenance and checksum records for as long as the model is in service.

02 - Implement

build

Implementation tips

The data team should record where every training dataset comes from (the supplier, the internal system, or the public source) and keep this provenance record alongside each dataset so its origin can always be traced back.
An engineer should generate a checksum (a unique digital fingerprint, such as a SHA-256 hash) for each dataset when it is received and recheck that fingerprint before the data is used for training, so any change to the file is detected.
The procurement or vendor manager should require data and AI suppliers to provide written confirmation of where their data originated and that it has not been altered, and capture this in the supply contract.
The AI project lead should run basic quality and sanity checks on incoming data (looking for unexpected entries, duplicated records, or signs of deliberate poisoning) and document the checks before approving the data for training.
The IT manager should store approved training datasets in a controlled location with restricted access and version control, so only verified data is used and any later change is logged and reviewable.

03 - Audit

fact_check

Audit / evidence tips

Askthe provenance records for a sample of training datasetsLook ata clear, named source for each one (vendor, internal system, or public repository)Goodtraces every dataset back to a documented, legitimate origin with no gaps
Askhow the organisation confirms training data has not been alteredLook atchecksums or hash values recorded on receipt and re-verified before useGoodshows the fingerprint was recorded and matched, with evidence of what happens when it does not match
Askto see the contracts or agreements with data and AI suppliersLook atclauses requiring the supplier to confirm data source and integrityGoodshows these obligations are written into the agreement, not just assumed
Askevidence of the quality and integrity checks run on incoming dataLook atdocumented sanity checks and the results, including any datasets that were rejectedGoodshows checks are routine and that bad data is actually turned away
Askwho can access and change the approved training datasetsLook atrestricted access, version control, and change logsGoodshows only authorised staff can touch the data and every change is recorded

04 - Mappings

link

Cross-framework mappings

How ISM-2087 relates to controls across ISO/IEC 27001, ISO/IEC 42001, Essential Eight, and ASD ISM.

ISO 27001

Control	Notes	Details
sync_altPartially overlaps(2)expand_less
Annex A 5.21	ISM-2087 requires the organisation to verify the source and integrity of training data used for AI models to prevent data poisoning
Annex A 8.30	ISM-2087 requires the organisation to verify the source and integrity of training data used to build AI models

ISO 42001

Control	Notes	Details
sync_altPartially overlaps(3)expand_less
Annex A 4.3	Annex A 4.3 requires documenting the data resources utilised for the AI system
Annex A 6.2.7	Annex A 6.2.7 requires the organisation to determine and provide AI system technical documentation appropriate to different interested pa...
Annex A 7.5	Annex A 7.5 requires a documented, end-to-end process for recording the provenance of data used in AI systems across the full life cycle