Skip to main content

Documentation Index

Fetch the complete documentation index at: https://usesapient.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Custom evals are higher-level tasks Sapient should test with coding agents. Use them for workflows that span multiple operations, such as installing an SDK, authenticating, and sending a first request.

Commands

CommandDescription
sapient api-performance custom-evals listList custom evals.
sapient api-performance custom-evals createCreate a custom eval.
sapient api-performance custom-evals retrieve --custom-eval-id <custom_eval_id>Retrieve one custom eval.
sapient api-performance custom-evals update --custom-eval-id <custom_eval_id>Update one custom eval.
sapient api-performance custom-evals delete --custom-eval-id <custom_eval_id>Delete one custom eval.
sapient api-performance custom-evals history listList custom eval history rows.

List custom evals

sapient api-performance custom-evals list --output-format json
Example response:
{
  "data": [
    {
      "id": "custom_eval_123",
      "prompt": "Install the SDK and crawl https://example.com",
      "prompt_template": null,
      "description": "Measures whether an agent can complete the first SDK request.",
      "competitor": null,
      "prompt_competitor": null,
      "prompt_competitor_group": null,
      "category": {
        "name": "Getting started",
        "slug": "getting-started"
      },
      "eval": {
        "id": "eval_123",
        "eval_type": "integration",
        "enabled": true
      }
    }
  ],
  "meta": {
    "count": 1
  }
}

Create a custom eval

sapient api-performance custom-evals create \
  --prompt "Install the SDK and crawl https://example.com" \
  --category-name "Getting started" \
  --description "Measures whether an agent can complete the first SDK request."
Configure the generated eval at creation time:
sapient api-performance custom-evals create \
  --prompt "Install the SDK and crawl https://example.com" \
  --description "The agent should complete the first working request." \
  --eval '{"docs_mode":"include","include_env_vars":true,"mcp_enabled":false}'
Use a prompt template when Sapient should render the prompt for a specific company. Templates support {{company_name}} and {{company_domain}}.
sapient api-performance custom-evals create \
  --body '{"prompt":"Build an integration for Example API.","prompt_template":"Build an integration for {{company_name}} using docs from {{company_domain}}.","category_name":"Getting started"}'

Retrieve a custom eval

sapient api-performance custom-evals retrieve --custom-eval-id custom_eval_123

List custom eval history

List evaluated history rows for all custom evals:
sapient api-performance custom-evals history list \
  --limit 50
Filter to one custom eval:
sapient api-performance custom-evals history list \
  --custom-eval-id custom_eval_123 \
  --sort-by time \
  --sort-dir desc \
  --limit 25
History rows include the custom eval prompt, target labels, pass/fail result, score, tool call count, latency, and run ID. Use the returned run ID with sapient api-performance runs retrieve to inspect full run detail. Useful flags:
FlagDescription
--custom-eval-idFilters history to one custom eval.
--result-filterFilters by result. Use all, passed, or failed.
--target-keyFilters to a target key from a previous history response.
--category-idFilters to one custom eval category.
--searchSearches prompts, categories, target labels, and errors.
--limitMaximum rows to return.
--offsetPagination offset.
--sort-bySorts by time, use_case, target, result, score, tool_calls, or latency.
--sort-dirSort direction. Use asc or desc.

Update a custom eval

sapient api-performance custom-evals update --custom-eval-id custom_eval_123 \
  --description '"Covers SDK install, authentication, and first crawl request."'
Update the attached eval definition:
sapient api-performance custom-evals update --custom-eval-id custom_eval_123 \
  --eval '{"expected_behavior":"The agent installs the SDK, authenticates, and prints the crawl result."}'
Updateable fields:
FlagDescription
--promptTask prompt.
--prompt-templateOptional prompt template. Supports {{company_name}} and {{company_domain}}.
--category-nameCategory label for organizing custom evals.
--descriptionExpected task outcome.
--competitorCompetitor ID for competitor-specific eval variants.
--prompt-competitor-groupBase custom eval ID for competitor variants.
--evalEval definition update object as JSON.

Delete a custom eval

sapient api-performance custom-evals delete --custom-eval-id custom_eval_123
Deleting a custom eval removes it from future API Performance runs.