Welcome to EvalAP


EvalAP (Evaluation API and Platform) is a high-level service designed to perform model evaluations. It provides a comprehensive API designed for Large Language Models (LLM) and IA agents in general. It also feature a platform to navigate, analyze and share results.
Built for LLMs & AI Agents
Purpose-built evaluation framework for Large Language Models and AI agents, with specialized tooling for modern AI systems.
Rapid Experiment Design
Save time with intuitive APIs and workflows. Design and launch evaluation experiments in minutes, not hours.
Navigate Your Results
Explore experiment sets, datasets, and results through an intuitive interface. Find insights faster with powerful navigation tools.
Smart Resource Management
Optimize compute resources with our intelligent runner. Parallelize experiments and manage workloads efficiently.
Rich Metrics Library
Leverage powerful evaluation libraries like DeepEval alongside our built-in metrics. Access existing datasets and extend with custom metrics for your specific needs.
Custom Leaderboards
Build and share your own leaderboards. Track model performance and compare results across experiments.