Welcome to EvalAP

EvalAP (Evaluation API and Platform) is a high-level service designed to perform model evaluations. It provides a comprehensive API designed for Large Language Models (LLM) and IA agents in general. It also feature a platform to navigate, analyze and share results.

Built for LLMs & AI Agents

Purpose-built evaluation framework for Large Language Models and AI agents, with specialized tooling for modern AI systems.

Rapid Experiment Design

Save time with intuitive APIs and workflows. Design and launch evaluation experiments in minutes, not hours.

Navigate Your Results

Explore experiment sets, datasets, and results through an intuitive interface. Find insights faster with powerful navigation tools.

Smart Resource Management

Optimize compute resources with our intelligent runner. Parallelize experiments and manage workloads efficiently.

Rich Metrics Library

Leverage powerful evaluation libraries like DeepEval alongside our built-in metrics. Access existing datasets and extend with custom metrics for your specific needs.

Custom Leaderboards

Build and share your own leaderboards. Track model performance and compare results across experiments.

Documentation GitHub Repository Public Platform