Rate Limiting for AI Model Inference Queries
Limit how often AI queries are run to prevent system overuse and improve efficiency.
🏛️ Framework
ASD Information Security Manual (ISM)
🧭 Control effect
Preventative
🔐 Classifications
NC, OS, P, S, TS
🗓️ ISM last updated
Nov 2025
✏️ Control Stack last updated
22 Feb 2026
🎯 E8 maturity levels
N/A
Guideline
Guidelines for software developmentTopic
Unbounded ConsumptionRate limiting is applied to inference queries for artificial intelligence models.
Source: ASD Information Security Manual (ISM)
Plain language
Rate limiting means setting limits on how often AI systems are allowed to process requests. This is important because without limits, the system could become overloaded, slow down, or even crash, causing disruptions and potentially leading to mistakes in important tasks.
Why it matters
Without rate limiting, AI inference APIs can be abused, driving up compute costs and causing service degradation or denial of service for legitimate users.
Operational notes
Monitor inference request rates and tune limits per client/model; log and review HTTP 429 events to detect abuse and adjust thresholds without blocking legitimate use.
Implementation tips
- IT team should establish query limits: Determine the maximum number of requests that the AI system can handle without slowdowns. Analyse system performance data to set these limits effectively.
- Procurement should coordinate with service providers: Engage with AI service suppliers to ensure they support rate limiting features. Confirm these features are included in contracts and service agreements.
- Developers should integrate rate limiting in application code: Include features that monitor and cap requests. Use simple coding practices to prevent the system from processing too many requests at once, like setting time intervals or request counts.
- System owners should monitor usage patterns: Continuously observe how often and for what purposes the AI is accessed. Adjust limits based on these insights to balance efficiency and accessibility.
- Managers should educate staff: Conduct training sessions on why rate limits are in place. Use relatable examples to explain how these limits maintain system reliability and ensure fairness in access.
Audit / evidence tips
-
Ask: system logs showing query volume: Request records that display the number of AI queries over a specific period
Good: would show consistent adherence to set limits
-
Ask: documentation on rate limit settings: Request the file or record detailing the current rate limits in use
Good: includes both the numbers themselves and the rationale for their setting
-
Ask: contracts with service providers: Request copies of agreements with AI service suppliers
Good: is a legally binding document with relevant terms clearly outlined
-
Ask: incident reports related to overload: Request any logs detailing system overload or failures due to excessive queries
Good: would show prompt response and adjustments where necessary
-
Ask: training materials given to staff: Request copies of any presentations or guides on rate limiting
Good: includes materials with clear explanations and examples that resonate with everyday tasks
Cross-framework mappings
How ISM-2090 relates to controls across ISO/IEC 27001, Essential Eight, and ASD ISM.
These mappings show relationships between controls across frameworks. They do not imply full equivalence or certification.
ISO 27001
| Control | Notes | Details |
|---|---|---|
| Partially meets (1) | ||
| Annex A 8.6 | ISM-2090 requires rate limiting to be applied to AI model inference queries to prevent overuse and manage service availability | |