Increase the Number of Parallel Requests
This guide will show you how to configure Evalap to handle more parallel requests, which can significantly speed up evaluation experiments.
Understanding Parallel Processing in Evalap
Evalap can send multiple requests to models simultaneously, which is especially useful when evaluating large datasets. By default, Evalap uses a conservative number of parallel requests to avoid overwhelming the models or your system resources.
How Parallelism Works in Evalap
Evalap uses ZeroMQ (ZMQ) for task distribution and a thread pool for parallel processing. Here's how it works:
- The main process receives evaluation tasks and distributes them to worker threads
- A fixed pool of 8 worker threads processes these tasks concurrently
- Each worker thread handles one task at a time, allowing up to 8 tasks to be processed simultaneously
This architecture is implemented in the runner component, which creates the worker threads and manages the message queue.
Modifying Concurrency Settings
The current concurrency level is set to 8 parallel tasks by the MAX_CONCURRENT_TASKS
constant in evalap/api/config.py
. To modify this setting:
- Open the
evalap/api/config.py
file - Locate and change the
MAX_CONCURRENT_TASKS = 8
line - Restart the Evalap service for changes to take effect
When increasing this value, consider your system's available CPU cores and memory, as well as any rate limits imposed by the model APIs you're using.
Determining Optimal Parallelism
The optimal number of parallel requests depends on several factors:
- System Resources: Your server's CPU cores and memory will limit effective parallelism
- Model API Capacity: Check your model provider's documentation for rate limits
- Network Bandwidth: Higher parallelism requires more bandwidth
- Response Time: Models with longer inference times benefit more from parallelism
Monitoring and Troubleshooting
Signs of Too Much Parallelism
- Increased error rates from model APIs
- Timeouts or connection errors
- System resource exhaustion
- Throttling messages from the API provider