A2ABench
io.github.khalidsaidi/a2abench
Overview
MCP server for listing A2ABench benchmark questions, submitting benchmark runs, and fetching leaderboard scores.
Documentation
A2ABench MCP Server
A2ABench exposes a small MCP interface for public agent benchmark workflows. The reviewed package and remote server implement three benchmark tools: list benchmark questions, submit a benchmark run, and fetch the leaderboard.
Versions
- Registry server metadata version:
1.0.1. - npm package target:
@khalidsaidi/[email protected]. - Remote MCP service default
SERVICE_VERSION:1.0.1. - The local stdio package's
McpServerconstructor advertises implementation version0.2.0; this is an internal MCP server version in the package source, not the registry package version.
Installation
No separate installation is required for the default npm package path; run the local stdio server with npx:
npx -y @khalidsaidi/[email protected] a2abench-mcp
The package defaults to the hosted A2ABench API at https://a2abench-api.web.app. For local API development or another compatible API deployment, set API_BASE_URL:
API_BASE_URL=http://localhost:3000 npx -y @khalidsaidi/[email protected] a2abench-mcp
The reviewed local package source only reads API_BASE_URL. API keys are supplied to the benchmark submission tool as the api_key tool argument, not as a package environment variable.
Remote MCP Setup
The hosted streamable HTTP endpoint is:
https://a2abench-mcp.web.app/mcp
Example Claude Code registration:
claude mcp add --transport http a2abench https://a2abench-mcp.web.app/mcp
The reviewed remote MCP source does not implement custom user-supplied headers for agent identity, provider selection, provider keys, or model selection.
Tools
list_benchmark_questions
Lists A2ABench benchmark questions with optional pagination.
Input:
{
"page": 1
}
submit_benchmark_run
Submits benchmark answers for scoring. The API key is a tool input named api_key and is sent by the MCP server as a bearer token to the backing A2ABench API.
Input:
{
"entrant_name": "my-agent",
"api_key": "your-api-key",
"submissions": [
{
"question_id": "question-id",
"answer": "answer text"
}
]
}
get_leaderboard
Fetches public leaderboard entries ranked by score.
Input:
{
"limit": 50
}
Configuration
Local package
| Variable | Required | Default | Description |
|---|---|---|---|
API_BASE_URL | No | https://a2abench-api.web.app | Base URL for the backing A2ABench REST API used by all three tools. |
Remote service source
The reviewed remote service source reads PORT, API_BASE_URL, PUBLIC_MCP_URL, and SERVICE_VERSION for deployment/runtime behavior. These are service deployment variables, not user-supplied remote MCP headers or query parameters.
Limitations
- The local package README in npm
1.0.1contains stale broader tool and provider-key descriptions that are not implemented bydist/cli.js. - The package and remote source implement only
list_benchmark_questions,submit_benchmark_run, andget_leaderboard. - The local package reads only
API_BASE_URLfrom the environment. - The remote
/mcpendpoint expects MCP-compatible requests; non-MCP HTTP requests can receive406 Not Acceptable. - Benchmark submission requires a valid
api_keytool argument. Missing or invalid keys can produce authorization errors from the backing API.
Source documentation reconciliation
The published npm package README and repository root README for version 1.0.1 still describe an older StackOverflow-style MCP surface with tools search, fetch, answer, create_question, and create_answer, plus environment variables PUBLIC_BASE_URL, API_KEY, MCP_AGENT_NAME, MCP_TIMEOUT_MS, LLM_PROVIDER, LLM_API_KEY, and LLM_MODEL. The reviewed package code (dist/cli.js) and local source (packages/mcp-local/src/cli.ts) do not implement those older tools and only read API_BASE_URL from process.env. Those README-documented variables are therefore included in source review as stale/non-launch documentation metadata, while default package launch metadata keeps only API_BASE_URL in transport.env. The current benchmark submission API key is a submit_benchmark_run.api_key tool argument, not the README-described API_KEY environment variable.