Haufe.ai | Rate Limits

Rate limits are enforced using two windows:

RPS (requests per second) controls how many concurrent requests you can send in a burst, while RPM (requests per minute) controls your sustained throughput.

If you exceed either limit, the API returns HTTP 429 — see Handling Rate Limit Errors for retry guidance.

Answer Generation Endpoints

The following limits apply to answer-generation endpoints (/run(/stream), /chat/completions(/stream)). These endpoints share a single rate limit. Only these endpoints count toward your tier limits — other API calls such as creating threads or listing messages are not affected.

During the trial phase, the following limits apply:

Tier	RPS	RPM
0	15	60

Your current plan and its rate limits are visible in the customer portal. If you need higher limits or have any questions, contact your Haufe representative.

Other Endpoints

All other endpoints (e.g. listing assistants, fetching messages) share a single rate limit that applies regardless of your tier:

RPS	RPM
60	180

Handling Rate Limit Errors

If you exceed these limits, the API returns HTTP 429 Too Many Requests. The response includes headers to help you implement retry logic — see Error Handling for details.

Implement retry logic with exponential backoff where appropriate.

Feel free to reach out to your Haufe representative if you have any questions.

Next Steps

error

Error Handling

Handle 429 responses and other errors with proper retry logic.

Rate Limits

Answer Generation Endpoints

Other Endpoints

Handling Rate Limit Errors

Next Steps

Error Handling

On this page