Rate Limiting for AI Model Inference Queries
Limit how often AI queries are run to prevent system overuse and improve efficiency.
Plain language
Rate limiting means setting limits on how often AI systems are allowed to process requests. This is important because without limits, the system could become overloaded, slow down, or even crash, causing disruptions and potentially leading to mistakes in important tasks.
Framework
ASD Information Security Manual (ISM)
Control effect
Preventative
Classifications
NC, OS, P, S, TS
ISM last updated
Nov 2025
Control Stack last updated
19 Mar 2026
E8 maturity levels
N/A
Guideline
Guidelines for software developmentTopic
Unbounded ConsumptionOfficial control statement
Rate limiting is applied to inference queries for artificial intelligence models.
Why it matters
Without rate limiting, AI inference APIs can be abused, driving up compute costs and causing service degradation or denial of service for legitimate users.
Operational notes
Monitor inference request rates and tune limits per client/model; log and review HTTP 429 events to detect abuse and adjust thresholds without blocking legitimate use.
Implementation tips
- IT team should establish query limits: Determine the maximum number of requests that the AI system can handle without slowdowns. Analyse system performance data to set these limits effectively.
- Procurement should coordinate with service providers: Engage with AI service suppliers to ensure they support rate limiting features. Confirm these features are included in contracts and service agreements.
- Developers should integrate rate limiting in application code: Include features that monitor and cap requests. Use simple coding practices to prevent the system from processing too many requests at once, like setting time intervals or request counts.
- System owners should monitor usage patterns: Continuously observe how often and for what purposes the AI is accessed. Adjust limits based on these insights to balance efficiency and accessibility.
- Managers should educate staff: Conduct training sessions on why rate limits are in place. Use relatable examples to explain how these limits maintain system reliability and ensure fairness in access.
Audit / evidence tips
-
Asksystem logs showing query volume: Request records that display the number of AI queries over a specific period
Goodwould show consistent adherence to set limits
-
Askdocumentation on rate limit settings: Request the file or record detailing the current rate limits in use
Goodincludes both the numbers themselves and the rationale for their setting
-
Askcontracts with service providers: Request copies of agreements with AI service suppliers
Goodis a legally binding document with relevant terms clearly outlined
-
Askincident reports related to overload: Request any logs detailing system overload or failures due to excessive queries
Goodwould show prompt response and adjustments where necessary
-
Asktraining materials given to staff: Request copies of any presentations or guides on rate limiting
Goodincludes materials with clear explanations and examples that resonate with everyday tasks
Cross-framework mappings
How ISM-2090 relates to controls across ISO/IEC 27001, Essential Eight, and ASD ISM.
ISO 27001
| Control | Notes | Details |
|---|---|---|
| handshake Supports (1) expand_less | ||
| Annex A 8.6 | Annex A 8.6 requires monitoring and adjustment of resource use to prevent performance degradation or failures due to capacity shortfalls | |
These mappings show relationships between controls across frameworks. They do not imply full equivalence or certification.