Custom evals - Sapient

Custom evals are higher-level tasks Sapient should test with coding agents. Use them for workflows that span multiple operations, such as installing an SDK, authenticating, and sending a first request.

Commands

Command	Description
`sapient api-performance custom-evals list`	List custom evals.
`sapient api-performance custom-evals create`	Create a custom eval.
`sapient api-performance custom-evals retrieve --custom-eval-id <custom_eval_id>`	Retrieve one custom eval.
`sapient api-performance custom-evals update --custom-eval-id <custom_eval_id>`	Update one custom eval.
`sapient api-performance custom-evals delete --custom-eval-id <custom_eval_id>`	Delete one custom eval.
`sapient api-performance custom-evals history list`	List custom eval history rows.

List custom evals

sapient api-performance custom-evals list --output-format json

Example response:

{
  "data": [
    {
      "id": "custom_eval_123",
      "prompt": "Install the SDK and crawl https://example.com",
      "prompt_template": null,
      "description": "Measures whether an agent can complete the first SDK request.",
      "competitor": null,
      "prompt_competitor": null,
      "prompt_competitor_group": null,
      "category": {
        "name": "Getting started",
        "slug": "getting-started"
      },
      "eval": {
        "id": "eval_123",
        "eval_type": "integration",
        "enabled": true
      }
    }
  ],
  "meta": {
    "count": 1
  }
}

Create a custom eval

sapient api-performance custom-evals create \
  --prompt "Install the SDK and crawl https://example.com" \
  --category-name "Getting started" \
  --description "Measures whether an agent can complete the first SDK request."

Configure the generated eval at creation time:

sapient api-performance custom-evals create \
  --prompt "Install the SDK and crawl https://example.com" \
  --description "The agent should complete the first working request." \
  --eval '{"docs_mode":"include","include_env_vars":true,"mcp_enabled":false}'

Use a prompt template when Sapient should render the prompt for a specific company. Templates support {{company_name}} and {{company_domain}}.

sapient api-performance custom-evals create \
  --body '{"prompt":"Build an integration for Example API.","prompt_template":"Build an integration for {{company_name}} using docs from {{company_domain}}.","category_name":"Getting started"}'

Retrieve a custom eval

sapient api-performance custom-evals retrieve --custom-eval-id custom_eval_123

List custom eval history

List evaluated history rows for all custom evals:

sapient api-performance custom-evals history list \
  --limit 50

Filter to one custom eval:

sapient api-performance custom-evals history list \
  --custom-eval-id custom_eval_123 \
  --sort-by time \
  --sort-dir desc \
  --limit 25

History rows include the custom eval prompt, target labels, pass/fail result, score, tool call count, latency, and run ID. Use the returned run ID with sapient api-performance runs retrieve to inspect full run detail. Useful flags:

Flag	Description
`--custom-eval-id`	Filters history to one custom eval.
`--result-filter`	Filters by result. Use `all`, `passed`, or `failed`.
`--target-key`	Filters to a target key from a previous history response.
`--category-id`	Filters to one custom eval category.
`--search`	Searches prompts, categories, target labels, and errors.
`--limit`	Maximum rows to return.
`--offset`	Pagination offset.
`--sort-by`	Sorts by `time`, `use_case`, `target`, `result`, `score`, `tool_calls`, or `latency`.
`--sort-dir`	Sort direction. Use `asc` or `desc`.

Update a custom eval

sapient api-performance custom-evals update --custom-eval-id custom_eval_123 \
  --description '"Covers SDK install, authentication, and first crawl request."'

Update the attached eval definition:

sapient api-performance custom-evals update --custom-eval-id custom_eval_123 \
  --eval '{"expected_behavior":"The agent installs the SDK, authenticates, and prints the crawl result."}'

Updateable fields:

Flag	Description
`--prompt`	Task prompt.
`--prompt-template`	Optional prompt template. Supports `{{company_name}}` and `{{company_domain}}`.
`--category-name`	Category label for organizing custom evals.
`--description`	Expected task outcome.
`--competitor`	Competitor ID for competitor-specific eval variants.
`--prompt-competitor-group`	Base custom eval ID for competitor variants.
`--eval`	Eval definition update object as JSON.

Delete a custom eval

sapient api-performance custom-evals delete --custom-eval-id custom_eval_123

Deleting a custom eval removes it from future API Performance runs.

Documentation Index

​Commands

​List custom evals

​Create a custom eval

​Retrieve a custom eval

​List custom eval history

​Update a custom eval

​Delete a custom eval