Monitor Baselines for AI Application Performance
Establish baselines of expected behaviour and performance for AI applications and continuously monitor them to detect unexpected deviations.
Plain language
This control means recording what "normal" looks like for an AI application (both how it behaves and how it performs) and then watching live activity against those recorded baselines. When the application starts producing different outputs, slows down, or makes more errors than the baseline expects, the deviation is flagged so it can be investigated. It catches model drift, degraded accuracy and abnormal behaviour before they cause harm.
Framework
ASD Information Security Manual (ISM)
Control effect
Preventative
Classifications
NC, OS, P, S, TS
ISM last updated
June 2026
Control Stack last updated
19 June 2026
E8 maturity levels
N/A
Guideline
Guidelines for system hardeningSection
User Application HardeningOfficial control statement
Baselines of expected behaviour and performance for AI applications are established and monitored for unexpected deviations.
Why it matters
Without established and monitored baselines, gradual model drift, accuracy degradation, prompt-injection effects or resource abuse in an AI application go unnoticed because there is no "known good" reference to compare against. Deviations such as rising hallucination rates, climbing token consumption or latency spikes are only discovered after they have already produced wrong outputs, breached data or disrupted dependent business processes.
Operational notes
Baselines are not set-and-forget: recalibrate them after every model retraining, version upgrade, prompt or system-message change, or confirmed data-drift event, since the previous "normal" no longer applies. Keep a short window of pre-change and post-change metrics so you can distinguish an intended shift from an unexpected one. Tune alert thresholds periodically to reduce false positives while still catching genuine deviations, and review flagged deviations on a defined cadence so trends (not just single spikes) are caught. Retain the raw behaviour and performance telemetry long enough to reconstruct what normal looked like before a suspected deviation.
Implementation tips
- For each AI application, define the specific metrics to baseline: behaviour metrics such as output/embedding drift, response accuracy or quality score, error rate and refusal/safety-trigger rate, and performance metrics such as latency, throughput and token (or compute) usage per request.
- Capture each metric over a representative period of normal operation and record a baseline as a range or statistical band (for example mean plus tolerance, or expected percentiles) rather than a single fixed number, so normal variation is not flagged.
- Feed the live metrics into a monitoring or observability tool and configure it to compare current values against each baseline continuously, with one view per AI application.
- Set concrete alert thresholds tied to each baseline (for example trigger when accuracy or quality drops more than a set amount, latency exceeds the baseline percentile, or token usage or error rate breaches the baseline band) and route alerts to the team that owns the application.
- Recalibrate the affected baselines whenever the model is retrained, the version or prompt/system message changes, or a data-drift event is confirmed, and record the old and new values so intended shifts are distinguished from unexpected ones.
- When an alert fires, run a root-cause investigation (model, data, infrastructure or external interference) and record the deviation and corrective action in a register so repeat deviations are tracked.
Audit / evidence tips
- For a sample AI application, request the baseline definition and confirm it covers both behaviour metrics (such as output drift, error or refusal rate) and performance metrics (such as latency, throughput and token usage), each with a documented normal range rather than just a single nominal value.
- Inspect the monitoring configuration and confirm the live metrics for that application are actually compared against its baseline, not merely collected; trace one metric from capture through to its baseline comparison.
- Examine the alert rules and confirm concrete deviation thresholds are configured (for example accuracy below a set percentage, latency or token usage above the baseline band) and that an alert fired and was actioned for a real deviation in the logs.
- Confirm baselines were recalibrated after the most recent model retrain, version upgrade or data-drift event by comparing the baseline change date and values against the application's change/release history.
- Review the deviation register for a recent flagged deviation and trace it to its investigation and corrective action to confirm monitoring leads to a response, not just a log entry.
- Check that baselines and monitoring exist for each AI application in scope, not only one flagship system, by reconciling the AI application inventory against the monitored set.
Cross-framework mappings
How ISM-2114 relates to controls across ISO/IEC 27001, ISO/IEC 42001, Essential Eight, and ASD ISM.
ISO 27001
| Control | Notes | Details |
|---|---|---|
| layers Partially meets (1) expand_less | ||
| Annex A 8.16 | ISM-2114 requires establishing baselines for AI application behaviour and performance and monitoring for deviations | |
| sync_alt Partially overlaps (1) expand_less | ||
| Annex A 8.6 | ISM-2114 involves establishing baselines for AI performance and behaviour to monitor deviations | |
ISO 42001
| Control | Notes | Details |
|---|---|---|
| handshake Supports (1) expand_less | ||
| Annex A 6.2.6 | ISM-2114 necessitates defining and monitoring AI application baselines for unexpected deviations | |
These mappings show relationships between controls across frameworks. They do not imply full equivalence or certification.