Rate Limits
Request rate limits for Copilot via API endpoints.
Rate limits are enforced using two windows:
RPS (requests per second) controls how many concurrent requests you can send in a burst, while RPM (requests per minute) controls your sustained throughput.
If you exceed either limit, the API returns HTTP 429 ā see
Handling Rate Limit Errors for retry guidance.
Answer Generation Endpoints
The following limits apply to answer-generation endpoints
(/run(/stream), /chat/completions(/stream)).
These endpoints share a single rate limit. Only these endpoints count toward your tier limits
ā other API calls such as creating threads or listing messages are not affected.
During the trial phase, the following limits apply:
| Tier | RPS | RPM |
|---|---|---|
| 0 | 15 | 60 |
Your current plan and its rate limits are visible in the customer portal. If you need higher limits or have any questions, contact your Haufe representative.
Other Endpoints
All other endpoints (e.g. listing assistants, fetching messages) share a single rate limit that applies regardless of your tier:
| RPS | RPM |
|---|---|
| 60 | 180 |
Handling Rate Limit Errors
If you exceed these limits, the API returns HTTP 429 Too Many Requests.
The response includes headers to help you implement retry logic ā see Error Handling for details.
Implement retry logic with exponential backoff where appropriate.
Feel free to reach out to your Haufe representative if you have any questions.