policyASD Information Security Manual (ISM)

ASD ISM 2114Monitor Baselines for AI Application Performance

Establish baselines of expected behaviour and performance for AI applications and continuously monitor them to detect unexpected deviations.

Preventative NC OS P Artificial intelligence Application securityASD Information Security ManualGuidelines for system hardening Monitoring Performance

record_voice_over

Plain language

This control means recording what "normal" looks like for an AI application (both how it behaves and how it performs) and then watching live activity against those recorded baselines. When the application starts producing different outputs, slows down, or makes more errors than the baseline expects, the deviation is flagged so it can be investigated. It catches model drift, degraded accuracy and abnormal behaviour before they cause harm.

01 - Overview

Framework

ASD Information Security Manual (ISM)

Control effect

Preventative

Classifications

NC, OS, P, S, TS

ISM last updated

June 2026

Control Stack last updated

19 June 2026

E8 maturity levels

N/A

Guideline

Guidelines for system hardening

Section

User Application Hardening

Topic

Artificial Intelligence Applications

Official control statement

Baselines of expected behaviour and performance for AI applications are established and monitored for unexpected deviations.

policyASD Information Security Manual (ISM)ISM-2114

priority_high

Why it matters

Without established and monitored baselines, gradual model drift, accuracy degradation, prompt-injection effects or resource abuse in an AI application go unnoticed because there is no "known good" reference to compare against. Deviations such as rising hallucination rates, climbing token consumption or latency spikes are only discovered after they have already produced wrong outputs, breached data or disrupted dependent business processes.

settings

Operational notes

Baselines are not set-and-forget: recalibrate them after every model retraining, version upgrade, prompt or system-message change, or confirmed data-drift event, since the previous "normal" no longer applies. Keep a short window of pre-change and post-change metrics so you can distinguish an intended shift from an unexpected one. Tune alert thresholds periodically to reduce false positives while still catching genuine deviations, and review flagged deviations on a defined cadence so trends (not just single spikes) are caught. Retain the raw behaviour and performance telemetry long enough to reconstruct what normal looked like before a suspected deviation.

02 - Implement

build

Implementation tips

For each AI application, define the specific metrics to baseline: behaviour metrics such as output/embedding drift, response accuracy or quality score, error rate and refusal/safety-trigger rate, and performance metrics such as latency, throughput and token (or compute) usage per request.
Capture each metric over a representative period of normal operation and record a baseline as a range or statistical band (for example mean plus tolerance, or expected percentiles) rather than a single fixed number, so normal variation is not flagged.
Feed the live metrics into a monitoring or observability tool and configure it to compare current values against each baseline continuously, with one view per AI application.
Set concrete alert thresholds tied to each baseline (for example trigger when accuracy or quality drops more than a set amount, latency exceeds the baseline percentile, or token usage or error rate breaches the baseline band) and route alerts to the team that owns the application.
Recalibrate the affected baselines whenever the model is retrained, the version or prompt/system message changes, or a data-drift event is confirmed, and record the old and new values so intended shifts are distinguished from unexpected ones.
When an alert fires, run a root-cause investigation (model, data, infrastructure or external interference) and record the deviation and corrective action in a register so repeat deviations are tracked.

03 - Audit

fact_check

Audit / evidence tips

For a sample AI application, request the baseline definition and confirm it covers both behaviour metrics (such as output drift, error or refusal rate) and performance metrics (such as latency, throughput and token usage), each with a documented normal range rather than just a single nominal value.
Inspect the monitoring configuration and confirm the live metrics for that application are actually compared against its baseline, not merely collected; trace one metric from capture through to its baseline comparison.
Examine the alert rules and confirm concrete deviation thresholds are configured (for example accuracy below a set percentage, latency or token usage above the baseline band) and that an alert fired and was actioned for a real deviation in the logs.
Confirm baselines were recalibrated after the most recent model retrain, version upgrade or data-drift event by comparing the baseline change date and values against the application's change/release history.
Review the deviation register for a recent flagged deviation and trace it to its investigation and corrective action to confirm monitoring leads to a response, not just a log entry.
Check that baselines and monitoring exist for each AI application in scope, not only one flagship system, by reconciling the AI application inventory against the monitored set.

04 - Mappings

link

Cross-framework mappings

How ISM-2114 relates to controls across ISO/IEC 27001, ISO/IEC 42001, Essential Eight, and ASD ISM.

ISO 27001

Control	Notes	Details
layersPartially meets(1)expand_less
Annex A 8.16	ISM-2114 requires establishing baselines for AI application behaviour and performance and monitoring for deviations
sync_altPartially overlaps(1)expand_less
Annex A 8.6	ISM-2114 involves establishing baselines for AI performance and behaviour to monitor deviations