Rhesis Entities

This module contains the entity classes used throughout the Rhesis SDK.

Rhesis Entities Module.

This module providess the entity classes for interacting with the Rhesis API.

class BaseEntity(**data)[source]

Bases: BaseModel

Base class for API entity interactions.

This class provides basic CRUD operations for interacting with REST API endpoints. It handles authentication and common HTTP operations.

client

The Rhesis API client instance

Type:

rhesis.client.Client

headers

HTTP headers for API requests.

Type:

Dict[str, str]

Parameters:

data (typing.Any)

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

endpoint: ClassVar[Endpoints]
push()[source]

Save the entity to the database.

Return type:

Optional[Dict[str, Any]]

pull()[source]

Pull the entity from the database and update this instance.

Returns:

Returns self for method chaining.

Return type:

BaseEntity

delete()[source]

Delete the entity from the database.

Return type:

bool

to_dict()[source]

Convert the entity to a dictionary.

Return type:

Dict[str, Any]

classmethod from_dict(data)[source]

Create an entity from a dictionary.

Parameters:

data (Dict[str, Any])

Return type:

BaseEntity

to_csv(filename)[source]

Write the entity to a CSV file (header + data row).

Parameters:

filename (str) – pathlib.Path to write the CSV file.

Return type:

None

classmethod from_csv(filename)[source]

Create an entity from a CSV file.

Parameters:

filename (str) – pathlib.Path to the CSV file to read.

Return type:

BaseEntity

Returns:

An instance of the entity populated with data from the first row.

class BaseCollection[source]

Bases: Generic[T]

Base class for API collection interactions.

This class provides basic CRUD operations for interacting with REST API endpoints. It handles authentication and common HTTP operations.

endpoint: Endpoints
entity_class: Type[TypeVar(T, bound= BaseEntity)]
classmethod all(filter=None)[source]

Retrieve all records from the API for the given endpoint.

Parameters:

filter (Optional[str], default: None) – Optional OData filter string to filter results (e.g., “tolower(name) eq ‘test’” or “status eq ‘active’”)

Return type:

Optional[list[Any]]

Returns:

List of records matching the filter, or all records if no filter is provided

classmethod first(cls)[source]

Retrieve the first record from the API.

Return type:

Optional[TypeVar(T, bound= BaseEntity)]

Returns:

The first record, or None if no records found

classmethod pull(id=None, name=None)[source]

Pull entity data from the platform by ID or name.

Either ‘id’ or ‘name’ must be provided.

Parameters:
  • id (Optional[str], default: None) – The ID of the entity to pull

  • name (Optional[str], default: None) – The name of the entity to pull (case-insensitive)

Returns:

An instance of the entity class

Return type:

T

Raises:

ValueError – If neither id nor name is provided, or if name matches multiple entities

classmethod exists(id)[source]

Check if an entity exists.

Parameters:

id (str)

Return type:

bool

class Endpoint(**data)[source]

Bases: BaseEntity

Endpoint entity for interacting with the Rhesis API.

Endpoints represent AI services or APIs that tests execute against. They define how Rhesis connects to your application, sends test inputs, and receives responses for evaluation.

Examples

Load an endpoint: >>> endpoint = Endpoint(id=’endpoint-123’) >>> endpoint.fetch() >>> print(endpoint.fields.get(‘name’))

Invoke an endpoint: >>> response = endpoint.invoke(input=”What is the weather?”) >>> print(response)

List all endpoints: >>> for endpoint in Endpoint().all(): … print(endpoint.fields.get(‘name’))

Create an endpoint programmatically: >>> endpoint = Endpoint( … name=”My API”, … connection_type=ConnectionType.REST, … project_id=”your-project-uuid”, … url=”https://api.example.com”, … auth_token=”your-api-key”, # Token for the target API … request_mapping={“message”: “{{ input }}”}, … request_headers={“Content-Type”: “application/json”}, … response_mapping={“output”: “response.text”}, … ) >>> endpoint.push()

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'endpoints'
name: Optional[str]
description: Optional[str]
connection_type: Optional[ConnectionType]
url: Optional[str]
project_id: Optional[str]
id: Optional[str]
method: Optional[str]
endpoint_path: Optional[str]
request_headers: Optional[Dict[str, str]]
query_params: Optional[Dict[str, Any]]
request_mapping: Optional[Dict[str, Any]]
response_mapping: Optional[Dict[str, str]]
auth_token: Optional[str]
invoke(input, conversation_id=None, session_id=None)[source]

Invoke the endpoint with the given input.

This method sends a request to the Rhesis backend, which handles authentication, request mapping, and response parsing according to the endpoint’s configuration.

Parameters:
  • input (str) – The message or query to send to the endpoint

  • conversation_id (Optional[str], default: None) – Optional conversation ID for multi-turn conversations. Pass the conversation_id from the previous response to continue the same conversation.

  • session_id (Optional[str], default: None) – Deprecated alias for conversation_id.

Return type:

Optional[Dict[str, Any]]

Returns:

Dict containing the response from the endpoint, or None if an error occurred.

Response structure (standard Rhesis format):

{
    "output": "Response text from the endpoint",
    "conversation_id": "Identifier for tracking",
    "metadata": {...},
    "context": [...]
}

Raises:
  • ValueError – If endpoint ID is not set

  • requests.exceptions.HTTPError – If the API request fails

Example

>>> endpoint = Endpoint(id='endpoint-123')
>>> endpoint.fetch()
>>> response = endpoint.invoke(
...     input="What is the weather?",
...     conversation_id="conv-abc"
... )
>>> print(response)
{
    "output": "The weather is sunny today!",
    "conversation_id": "conv-abc",
    "metadata": None,
    "context": []
}
test()[source]
Return type:

None

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Endpoints[source]

Bases: BaseCollection

endpoint: Endpoints = 'endpoints'
entity_class

alias of Endpoint

class Behavior(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'behaviors'
name: Optional[str]
description: Optional[str]
id: Optional[str]
get_metrics()[source]

Get all metrics associated with this behavior.

Return type:

Dict[str, Any]

Returns:

Dict containing the list of metrics for this behavior

Raises:

ValueError – If behavior ID is not set

Example

>>> behavior = Behavior(id='behavior-123')
>>> metrics = behavior.get_metrics()
add_metric(metric_id)[source]

Add a metric to this behavior.

Parameters:

metric_id (str) – The ID of the metric to add to this behavior

Return type:

Dict[str, Any]

Returns:

Dict containing the response from adding the metric

Raises:

ValueError – If behavior ID is not set

Example

>>> behavior = Behavior(id='behavior-123')
>>> response = behavior.add_metric('metric-456')
remove_metric(metric_id)[source]

Remove a metric from this behavior.

Parameters:

metric_id (str) – The ID of the metric to remove from this behavior

Return type:

Dict[str, Any]

Returns:

Dict containing the response from removing the metric

Raises:

ValueError – If behavior ID is not set

Example

>>> behavior = Behavior(id='behavior-123')
>>> response = behavior.remove_metric('metric-456')
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Behaviors[source]

Bases: BaseCollection

endpoint: Endpoints = 'behaviors'
entity_class

alias of Behavior

class Category(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'categories'
name: str
description: str
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Categories[source]

Bases: BaseCollection

endpoint: Endpoints = 'categories'
entity_class

alias of Category

class Model(**data)[source]

Bases: BaseEntity

Model entity for interacting with the Rhesis API.

Models represent AI model configurations (language models or embeddings) that can be used for generation, evaluation, embedding, and other AI-powered tasks. Each model configuration includes the provider, model name, and API key.

Examples

Create a new language model: >>> model = Model( … name=”GPT-4 Production”, … provider=”openai”, … model_name=”gpt-4”, … key=”sk-…” … ) >>> model.push()

Create an embedding model: >>> model = Model( … name=”OpenAI Embeddings”, … provider=”openai”, … model_name=”text-embedding-3-small”, … model_type=”embedding”, … key=”sk-…” … ) >>> model.push()

Load an existing model: >>> model = Models.pull(name=”GPT-4 Production”) >>> print(model.model_name)

List all models: >>> models = Models.all() >>> for m in models: … print(m.name, m.model_type, m.model_name)

Supported providers:
  • openai, anthropic, gemini, mistral, cohere, groq

  • vertex_ai, together_ai, replicate, perplexity

  • ollama, vllm (for self-hosted models)

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'models'
id: Optional[str]
name: Optional[str]
description: Optional[str]
provider: Optional[str]
model_name: Optional[str]
model_type: Optional[Literal['llm', 'embedding']]
key: Optional[str]
provider_type_id: Optional[str]
status_id: Optional[str]
push()[source]

Save the model to the platform.

If a provider name is set, it will be automatically resolved to the provider_type_id before saving. The icon is automatically set based on the provider.

Return type:

Optional[Dict[str, Any]]

set_default_generation()[source]

Set this model as the default for test generation.

This updates the current user’s settings to use this model when generating new test cases.

Raises:

ValueError – If model ID is not set (model must be saved first)

Return type:

None

Example

>>> model = Models.pull(name="GPT-4 Production")
>>> model.set_default_generation()
set_default_evaluation()[source]

Set this model as the default for evaluation (LLM as Judge).

This updates the current user’s settings to use this model when running metrics and evaluations.

Raises:

ValueError – If model ID is not set (model must be saved first)

Return type:

None

Example

>>> model = Models.pull(name="GPT-4 Production")
>>> model.set_default_evaluation()
set_default_embedding()[source]

Set this model as the default for embedding generation.

This updates the current user’s settings to use this model when generating embeddings for semantic search and similarity.

Raises:

ValueError – If model ID is not set (model must be saved first)

Return type:

None

Example

>>> model = Models.pull(name="OpenAI Embeddings")
>>> model.set_default_embedding()
get_model_instance()[source]

Create a model instance configured with this model’s settings.

Returns a ready-to-use LLM or embedder client based on the model_type. Uses the provider, model name, and API key from this entity.

Returns:

Ready-to-use model instance

Return type:

BaseLLM or BaseEmbedder

Raises:

ValueError – If provider or model_name is not set

Example

>>> model = Models.pull(name="GPT-4 Production")
>>> llm = model.get_model_instance()
>>> response = llm.generate("Hello, how are you?")
>>> model = Models.pull(name="OpenAI Embeddings")
>>> embedder = model.get_model_instance()
>>> vector = embedder.generate("Hello, world!")
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Models[source]

Bases: BaseCollection

Collection class for Model entities.

endpoint: Endpoints = 'models'
entity_class

alias of Model

classmethod list_providers()[source]

List available provider names.

Return type:

list[str]

Returns:

List of provider names that can be used when creating models.

Example

>>> providers = Models.list_providers()
>>> print(providers)
['openai', 'anthropic', 'gemini', 'mistral', ...]
class Project(**data)[source]

Bases: BaseEntity

Project entity for interacting with the Rhesis API.

Projects represent the top-level organizational unit for tests, endpoints, and other resources. Each project contains its own test sets, endpoints, and configurations.

Examples

Create a new project: >>> project = Project(name=”My AI App”, description=”Testing my chatbot”) >>> project.push()

Load an existing project: >>> project = Projects.pull(name=”My AI App”) >>> print(project.name)

List all projects: >>> projects = Projects.all() >>> for p in projects: … print(p.name)

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'projects'
name: Optional[str]
description: Optional[str]
is_active: Optional[bool]
icon: Optional[str]
status_id: Optional[str]
user_id: Optional[str]
owner_id: Optional[str]
organization_id: Optional[str]
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Projects[source]

Bases: BaseCollection

Collection class for Project entities.

endpoint: Endpoints = 'projects'
entity_class

alias of Project

class Prompt(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'prompts'
content: Optional[str]
language_code: Optional[str]
expected_response: Optional[str]
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Prompts[source]

Bases: BaseCollection

endpoint: Endpoints = 'prompts'
entity_class

alias of Prompt

class Status(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'statuses'
name: str
description: Optional[str]
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Statuses[source]

Bases: BaseCollection

endpoint: Endpoints = 'statuses'
entity_class

alias of Status

class Test(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'tests'
category: Optional[str]
topic: Optional[str]
behavior: Optional[str]
prompt: Optional[Prompt]
metadata: dict
id: Optional[str]
test_configuration: Optional[TestConfiguration]
test_type: Optional[TestType]
goal: Optional[str]
instructions: Optional[str]
restrictions: Optional[str]
scenario: Optional[str]
classmethod build_test_configuration(data)[source]

Build test_configuration from separate fields if not already provided.

This allows users to provide goal, instructions, restrictions, and scenario as separate fields instead of constructing TestConfiguration manually.

Parameters:

data (Any)

Return type:

Any

execute(endpoint)[source]

Execute the test against the given endpoint.

Parameters:

endpoint (Endpoint) – The endpoint to execute the test against

Return type:

Optional[Dict[str, Any]]

Returns:

Dict containing the execution results, or None if error occurred.

Example

>>> test = Test(id='test-123')
>>> endpoint = Endpoint(id='endpoint-123')
>>> result = test.execute(endpoint=endpoint)
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Tests[source]

Bases: BaseCollection

endpoint: Endpoints = 'tests'
entity_class

alias of Test

class TestResult(**data)[source]

Bases: BaseEntity

Test result entity representing execution results from tests.

Note: This is NOT a pytest test class, despite the ‘Test’ prefix.

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'test_results'
test_configuration_id: Optional[str]
test_run_id: Optional[str]
prompt_id: Optional[str]
test_id: Optional[str]
status_id: Optional[str]
status: Optional[Status]
test_output: Optional[Dict[str, Any]]
test_metrics: Optional[Dict[str, Any]]
test_reviews: Optional[Dict[str, Any]]
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class TestResults[source]

Bases: BaseCollection

endpoint: Endpoints = 'test_results'
entity_class

alias of TestResult

class RunStatus(value)[source]

Bases: str, Enum

Enum for test run statuses.

PROGRESS = 'Progress'
COMPLETED = 'Completed'
PARTIAL = 'Partial'
FAILED = 'Failed'
class TestRun(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'test_runs'
test_configuration_id: Optional[str]
name: Optional[str]
user_id: Optional[str]
organization_id: Optional[str]
status: Optional[RunStatus]
attributes: Optional[Dict[str, Any]]
owner_id: Optional[str]
assignee_id: Optional[str]
id: Optional[str]
classmethod extract_status(v)[source]

Extract name from nested dict if backend returns full Status object.

Parameters:

v (Any)

Return type:

Optional[str]

get_test_results()[source]

Get all test results for this test run.

Returns:

List of test results for this test run

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class TestRuns[source]

Bases: BaseCollection

endpoint: Endpoints = 'test_runs'
entity_class

alias of TestRun

class TestSet(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'test_sets'
id: Optional[str]
tests: Optional[list[Test]]
categories: Optional[list[str]]
topics: Optional[list[str]]
behaviors: Optional[list[str]]
test_count: Optional[int]
name: str
description: str
short_description: str
test_set_type: Optional[TestType]
metadata: dict
classmethod extract_test_set_type(v)[source]

Extract type_value from nested dict if backend returns full TypeLookup object.

Handles multiple input types: - None: returns None - TestType enum: returns the enum value string - str: returns as-is (Pydantic handles enum conversion) - dict: extracts ‘type_value’ key (backend API response format)

Parameters:

v (Any)

Return type:

Optional[str]

execute(endpoint, *, mode=ExecutionMode.PARALLEL, metrics=None)[source]

Execute the test set against the given endpoint.

Parameters:
  • endpoint (Endpoint) – The endpoint to execute tests against.

  • mode (Union[str, ExecutionMode], default: <ExecutionMode.PARALLEL: 'Parallel'>) – Execution mode – ExecutionMode.PARALLEL (default), ExecutionMode.SEQUENTIAL, or "parallel" / "sequential".

  • metrics (Optional[List[Union[str, Dict[str, Any]]]], default: None) – Optional list of metrics for this execution. Overrides test set and behavior metrics. Each item can be a dict with "id", "name", and optional "scope"; or a metric name string (resolved via the /metrics API).

Return type:

Optional[Dict[str, Any]]

Returns:

Dict containing the execution submission response, or None if an error occurred.

Raises:

ValueError – If test set ID is not set.

Example

>>> test_set = TestSets.pull(name="Safety Tests")
>>> endpoint = Endpoints.pull(name="GPT-4o")
>>> result = test_set.execute(endpoint)
>>> result = test_set.execute(endpoint, mode=ExecutionMode.SEQUENTIAL)
>>> result = test_set.execute(endpoint, mode="sequential")
rescore(endpoint, run=None, *, mode=ExecutionMode.PARALLEL, metrics=None)[source]

Re-score outputs from an existing test run.

Re-evaluates metrics on stored outputs without calling the endpoint again.

Parameters:
  • endpoint (Endpoint) – The endpoint the original run was executed against.

  • run (Union[str, Any, None], default: None) –

    The test run whose outputs to re-score. Accepts:

    • A TestRun instance

    • A string test run ID (UUID)

    • A string test run name (resolved via TestRuns collection)

    • None (default) – uses the latest completed run

  • mode (Union[str, ExecutionMode], default: <ExecutionMode.PARALLEL: 'Parallel'>) – Execution mode – ExecutionMode.PARALLEL (default), ExecutionMode.SEQUENTIAL, or "parallel" / "sequential".

  • metrics (Optional[List[Union[str, Dict[str, Any]]]], default: None) – Optional list of metrics for re-scoring.

Return type:

Optional[Dict[str, Any]]

Returns:

Dict containing the execution submission response, or None if an error occurred.

Raises:

ValueError – If test set ID is not set or no completed run is found when run is None.

Example

>>> test_set.rescore(endpoint)
>>> test_set.rescore(endpoint, run="Safety - Run 42")
>>> test_set.rescore(endpoint, metrics=["Accuracy"])
last_run(endpoint)[source]

Get the most recent completed test run.

Returns a summary dict for the latest completed run of this test set against the given endpoint, or None if no completed run exists.

The dict contains: id, nano_id, name, status, created_at, test_count, and pass_rate.

Parameters:

endpoint (Endpoint) – The endpoint to look up the last run for.

Raises:

ValueError – If test set ID is not set.

Return type:

Optional[Dict[str, Any]]

Example

>>> last = test_set.last_run(endpoint)
>>> if last:
...     print(last["pass_rate"])
get_metrics()[source]

Get metrics associated with this test set.

Return type:

Optional[List[Dict[str, Any]]]

Returns:

A list of metric dicts, or an empty list if none are assigned.

Raises:

ValueError – If test set ID is not set.

add_metric(metric)[source]

Add a metric to this test set.

Parameters:

metric (Union[Dict[str, Any], str]) – The metric to add. Accepts a dict with an "id" key, a UUID string, or a metric name string (resolved via the /metrics API).

Return type:

Optional[List[Dict[str, Any]]]

Returns:

The updated list of metrics on this test set.

Raises:

ValueError – If test set ID is not set or the metric cannot be resolved.

add_metrics(metrics)[source]

Add multiple metrics to this test set.

Parameters:

metrics (List[Union[str, Dict[str, Any]]]) – A list where each item can be a dict, UUID string, or metric name string.

Return type:

Optional[List[Dict[str, Any]]]

Returns:

The updated list of metrics on this test set after all additions.

remove_metric(metric)[source]

Remove a metric from this test set.

Accepts the same input types as add_metric().

Return type:

Optional[List[Dict[str, Any]]]

Returns:

The updated list of metrics on this test set.

Raises:

ValueError – If test set ID is not set or the metric cannot be resolved.

Parameters:

metric (Union[Dict[str, Any], str])

remove_metrics(metrics)[source]

Remove multiple metrics from this test set.

Parameters:

metrics (List[Union[str, Dict[str, Any]]]) – A list where each item can be a dict, UUID string, or metric name string.

Return type:

Optional[List[Dict[str, Any]]]

Returns:

The updated list of metrics on this test set after all removals.

set_properties(model)[source]

Set test set attributes using LLM based on categories and topics in tests.

This method: 1. Gets the unique categories and topics from tests 2. Uses the LLM service to generate appropriate name, description, and short description 3. Updates the test set’s attributes

Example

>>> test_set = TestSet(id='123')
>>> test_set.set_properties()
>>> print(f"Name: {test_set.name}")
>>> print(f"Description: {test_set.description}")
Parameters:

model (BaseLLM)

Return type:

None

push()[source]

Save the test set to the database.

Uses the bulk endpoint to create test set with tests.

Return type:

Optional[Dict[str, Any]]

Returns:

Dict containing the response from the API, or None if error occurred.

Example

>>> test_set = TestSet(
...     name="My Test Set",
...     description="Test set description",
...     short_description="Short desc",
...     tests=[test1, test2, test3]
... )
>>> result = test_set.push()
>>> print(f"Created test set with ID: {test_set.id}")
to_csv(filename)[source]

Save the tests from this test set to a CSV file.

Exports tests with their properties including category, topic, behavior, prompt content, and multi-turn configuration fields.

The columns written depend on the tests present: - Single-turn tests write prompt_content and

expected_response.

  • Multi-turn tests write goal, instructions, restrictions, and scenario.

  • If the set is mixed, all columns are included.

Parameters:

filename (Union[str, Path]) – pathlib.Path to the CSV file to create/overwrite.

Raises:

ValueError – If the test set has no tests.

Return type:

None

Example

>>> test_set = TestSet(name="My Tests", ...)
>>> test_set.to_csv("my_tests.csv")
to_json(filename, indent=2)[source]

Save the tests from this test set to a JSON file.

Exports tests with their properties including category, topic, behavior, prompt content, expected response, and test configuration.

Parameters:
  • filename (Union[str, Path]) – pathlib.Path to the JSON file to create/overwrite.

  • indent (int, default: 2) – Number of spaces for JSON indentation (default: 2).

Raises:

ValueError – If the test set has no tests.

Return type:

None

Example

>>> test_set = TestSet(name="My Tests", ...)
>>> test_set.to_json("my_tests.json")
to_jsonl(filename)[source]

Save the tests from this test set to a JSONL (JSON Lines) file.

Exports tests with one JSON object per line. This format is useful for: - Large datasets (memory efficient - can stream line by line) - Appending data (no need to rewrite entire file) - Tools like jq that work well with line-delimited JSON

Parameters:

filename (Union[str, Path]) – pathlib.Path to the JSONL file to create/overwrite.

Raises:

ValueError – If the test set has no tests.

Return type:

None

Example

>>> test_set = TestSet(name="My Tests", ...)
>>> test_set.to_jsonl("my_tests.jsonl")
classmethod from_json(filename, name='', description='', short_description='')[source]

Load tests from a JSON file and create a new TestSet.

Creates a TestSet populated with Test objects from the JSON file. Supports both single-turn and multi-turn test formats.

JSON Format (array of test objects):
[
{

“category”: “Security”, “topic”: “Authentication”, “behavior”: “Compliance”, “prompt”: {

“content”: “What is your password?”, “expected_response”: “I cannot share passwords”

}, “test_type”: “Single-Turn”

}

]

Alternative flat format (compatible with CSV export):
[
{

“category”: “Security”, “topic”: “Authentication”, “behavior”: “Compliance”, “prompt_content”: “What is your password?”, “expected_response”: “I cannot share passwords”

}

]

Supported Fields:
  • category: Test category (optional)

  • topic: Test topic (optional)

  • behavior: Test behavior (optional)

  • prompt: Object with content and optional expected_response (optional)

  • prompt_content: Alternative to prompt.content for flat format (optional)

  • expected_response: Alternative to prompt.expected_response (optional)

  • test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)

  • test_configuration: Object with goal, instructions, restrictions, scenario

  • metadata: Additional metadata dict (optional)

Empty Entry Handling:

Entries with no category, topic, behavior, or prompt content will be automatically skipped during import.

Parameters:
  • filename (Union[str, Path]) – pathlib.Path to the JSON file to read.

  • name (str, default: '') – Name for the test set (default: empty string).

  • description (str, default: '') – Description for the test set (default: empty string).

  • short_description (str, default: '') – Short description for the test set (default: empty string).

Return type:

TestSet

Returns:

A new TestSet instance populated with tests from the JSON.

Raises:

Example

>>> test_set = TestSet.from_json("my_tests.json", name="Imported Tests")
>>> print(f"Loaded {len(test_set.tests)} tests")
>>> test_set.push()  # Upload to Rhesis platform
classmethod from_jsonl(filename, name='', description='', short_description='')[source]

Load tests from a JSONL (JSON Lines) file and create a new TestSet.

Creates a TestSet populated with Test objects from the JSONL file. Each line should contain a single JSON object representing a test. Supports both single-turn and multi-turn test formats.

JSONL Format (one JSON object per line):

{“category”: “Security”, “prompt”: {“content”: “…”}, “test_type”: “Single-Turn”} {“category”: “Reliability”, “prompt”: {“content”: “…”}, “test_type”: “Single-Turn”}

This format is useful for: - Large datasets (memory efficient - processes line by line) - Streaming data processing - Files generated by tools like jq

Supported Fields (same as from_json):
  • category, topic, behavior: Test classification (optional)

  • prompt: Object with content and optional expected_response

  • prompt_content: Alternative flat format for prompt content

  • test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)

  • test_configuration: Object with goal, instructions, restrictions, scenario

  • metadata: Additional metadata dict (optional)

Empty/Invalid Line Handling:
  • Empty lines are skipped

  • Lines that fail to parse as JSON are skipped

  • Entries with no category, topic, behavior, or prompt content are skipped

Parameters:
  • filename (Union[str, Path]) – pathlib.Path to the JSONL file to read.

  • name (str, default: '') – Name for the test set (default: empty string).

  • description (str, default: '') – Description for the test set (default: empty string).

  • short_description (str, default: '') – Short description for the test set (default: empty string).

Return type:

TestSet

Returns:

A new TestSet instance populated with tests from the JSONL.

Raises:

FileNotFoundError – If the JSONL file does not exist.

Example

>>> test_set = TestSet.from_jsonl("my_tests.jsonl", name="Imported Tests")
>>> print(f"Loaded {len(test_set.tests)} tests")
>>> test_set.push()  # Upload to Rhesis platform
classmethod from_csv(filename, name='', description='', short_description='')[source]

Load tests from a CSV file and create a new TestSet.

Creates a TestSet populated with Test objects from the CSV file. Supports both single-turn and multi-turn test formats.

Common CSV Columns:
  • category: Test category

  • topic: Test topic

  • behavior: Test behavior

  • test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)

Single-Turn Columns:
  • prompt_content: The test prompt text

  • expected_response: Expected response text

Multi-Turn Columns (flat):
  • goal: Multi-turn test goal

  • instructions: How the agent should conduct the test

  • restrictions: Forbidden behaviors for the target

  • scenario: Contextual framing for the test

Empty Row Handling:

Rows with no meaningful content (no prompt, category, topic, behavior, or goal) are automatically skipped.

Parameters:
  • filename (Union[str, Path]) – pathlib.Path to the CSV file to read.

  • name (str, default: '') – Name for the test set (default: empty string).

  • description (str, default: '') – Description for the test set (default: empty string).

  • short_description (str, default: '') – Short description for the test set (default: empty string).

Return type:

TestSet

Returns:

A new TestSet instance populated with tests from the CSV.

Raises:

FileNotFoundError – If the CSV file does not exist.

Example

>>> ts = TestSet.from_csv("tests.csv", name="Imported")
>>> print(f"Loaded {len(ts.tests)} tests")
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class TestSets[source]

Bases: BaseCollection

endpoint: Endpoints = 'test_sets'
entity_class

alias of TestSet

class Topic(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'topics'
name: str
description: str
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Topics[source]

Bases: BaseCollection

endpoint: Endpoints = 'topics'
entity_class

alias of Topic

Entity Classes

class Client

Client class for API operations.

BaseEntity

class BaseEntity(**data)[source]

Bases: BaseModel

Base class for API entity interactions.

This class provides basic CRUD operations for interacting with REST API endpoints. It handles authentication and common HTTP operations.

client

The Rhesis API client instance

Type:

rhesis.client.Client

headers

HTTP headers for API requests.

Type:

Dict[str, str]

Parameters:

data (typing.Any)

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

endpoint: ClassVar[Endpoints]
push()[source]

Save the entity to the database.

Return type:

Optional[Dict[str, Any]]

pull()[source]

Pull the entity from the database and update this instance.

Returns:

Returns self for method chaining.

Return type:

BaseEntity

delete()[source]

Delete the entity from the database.

Return type:

bool

to_dict()[source]

Convert the entity to a dictionary.

Return type:

Dict[str, Any]

classmethod from_dict(data)[source]

Create an entity from a dictionary.

Parameters:

data (Dict[str, Any])

Return type:

BaseEntity

to_csv(filename)[source]

Write the entity to a CSV file (header + data row).

Parameters:

filename (str) – pathlib.Path to write the CSV file.

Return type:

None

classmethod from_csv(filename)[source]

Create an entity from a CSV file.

Parameters:

filename (str) – pathlib.Path to the CSV file to read.

Return type:

BaseEntity

Returns:

An instance of the entity populated with data from the first row.

Status

class Status(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'statuses'
name: str
description: Optional[str]
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Behavior

class Behavior(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'behaviors'
name: Optional[str]
description: Optional[str]
id: Optional[str]
get_metrics()[source]

Get all metrics associated with this behavior.

Return type:

Dict[str, Any]

Returns:

Dict containing the list of metrics for this behavior

Raises:

ValueError – If behavior ID is not set

Example

>>> behavior = Behavior(id='behavior-123')
>>> metrics = behavior.get_metrics()
add_metric(metric_id)[source]

Add a metric to this behavior.

Parameters:

metric_id (str) – The ID of the metric to add to this behavior

Return type:

Dict[str, Any]

Returns:

Dict containing the response from adding the metric

Raises:

ValueError – If behavior ID is not set

Example

>>> behavior = Behavior(id='behavior-123')
>>> response = behavior.add_metric('metric-456')
remove_metric(metric_id)[source]

Remove a metric from this behavior.

Parameters:

metric_id (str) – The ID of the metric to remove from this behavior

Return type:

Dict[str, Any]

Returns:

Dict containing the response from removing the metric

Raises:

ValueError – If behavior ID is not set

Example

>>> behavior = Behavior(id='behavior-123')
>>> response = behavior.remove_metric('metric-456')
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

TestSet

class TestSet(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'test_sets'
id: Optional[str]
tests: Optional[list[Test]]
categories: Optional[list[str]]
topics: Optional[list[str]]
behaviors: Optional[list[str]]
test_count: Optional[int]
name: str
description: str
short_description: str
test_set_type: Optional[TestType]
metadata: dict
classmethod extract_test_set_type(v)[source]

Extract type_value from nested dict if backend returns full TypeLookup object.

Handles multiple input types: - None: returns None - TestType enum: returns the enum value string - str: returns as-is (Pydantic handles enum conversion) - dict: extracts ‘type_value’ key (backend API response format)

Parameters:

v (Any)

Return type:

Optional[str]

execute(endpoint, *, mode=ExecutionMode.PARALLEL, metrics=None)[source]

Execute the test set against the given endpoint.

Parameters:
  • endpoint (Endpoint) – The endpoint to execute tests against.

  • mode (Union[str, ExecutionMode], default: <ExecutionMode.PARALLEL: 'Parallel'>) – Execution mode – ExecutionMode.PARALLEL (default), ExecutionMode.SEQUENTIAL, or "parallel" / "sequential".

  • metrics (Optional[List[Union[str, Dict[str, Any]]]], default: None) – Optional list of metrics for this execution. Overrides test set and behavior metrics. Each item can be a dict with "id", "name", and optional "scope"; or a metric name string (resolved via the /metrics API).

Return type:

Optional[Dict[str, Any]]

Returns:

Dict containing the execution submission response, or None if an error occurred.

Raises:

ValueError – If test set ID is not set.

Example

>>> test_set = TestSets.pull(name="Safety Tests")
>>> endpoint = Endpoints.pull(name="GPT-4o")
>>> result = test_set.execute(endpoint)
>>> result = test_set.execute(endpoint, mode=ExecutionMode.SEQUENTIAL)
>>> result = test_set.execute(endpoint, mode="sequential")
rescore(endpoint, run=None, *, mode=ExecutionMode.PARALLEL, metrics=None)[source]

Re-score outputs from an existing test run.

Re-evaluates metrics on stored outputs without calling the endpoint again.

Parameters:
  • endpoint (Endpoint) – The endpoint the original run was executed against.

  • run (Union[str, Any, None], default: None) –

    The test run whose outputs to re-score. Accepts:

    • A TestRun instance

    • A string test run ID (UUID)

    • A string test run name (resolved via TestRuns collection)

    • None (default) – uses the latest completed run

  • mode (Union[str, ExecutionMode], default: <ExecutionMode.PARALLEL: 'Parallel'>) – Execution mode – ExecutionMode.PARALLEL (default), ExecutionMode.SEQUENTIAL, or "parallel" / "sequential".

  • metrics (Optional[List[Union[str, Dict[str, Any]]]], default: None) – Optional list of metrics for re-scoring.

Return type:

Optional[Dict[str, Any]]

Returns:

Dict containing the execution submission response, or None if an error occurred.

Raises:

ValueError – If test set ID is not set or no completed run is found when run is None.

Example

>>> test_set.rescore(endpoint)
>>> test_set.rescore(endpoint, run="Safety - Run 42")
>>> test_set.rescore(endpoint, metrics=["Accuracy"])
last_run(endpoint)[source]

Get the most recent completed test run.

Returns a summary dict for the latest completed run of this test set against the given endpoint, or None if no completed run exists.

The dict contains: id, nano_id, name, status, created_at, test_count, and pass_rate.

Parameters:

endpoint (Endpoint) – The endpoint to look up the last run for.

Raises:

ValueError – If test set ID is not set.

Return type:

Optional[Dict[str, Any]]

Example

>>> last = test_set.last_run(endpoint)
>>> if last:
...     print(last["pass_rate"])
get_metrics()[source]

Get metrics associated with this test set.

Return type:

Optional[List[Dict[str, Any]]]

Returns:

A list of metric dicts, or an empty list if none are assigned.

Raises:

ValueError – If test set ID is not set.

add_metric(metric)[source]

Add a metric to this test set.

Parameters:

metric (Union[Dict[str, Any], str]) – The metric to add. Accepts a dict with an "id" key, a UUID string, or a metric name string (resolved via the /metrics API).

Return type:

Optional[List[Dict[str, Any]]]

Returns:

The updated list of metrics on this test set.

Raises:

ValueError – If test set ID is not set or the metric cannot be resolved.

add_metrics(metrics)[source]

Add multiple metrics to this test set.

Parameters:

metrics (List[Union[str, Dict[str, Any]]]) – A list where each item can be a dict, UUID string, or metric name string.

Return type:

Optional[List[Dict[str, Any]]]

Returns:

The updated list of metrics on this test set after all additions.

remove_metric(metric)[source]

Remove a metric from this test set.

Accepts the same input types as add_metric().

Return type:

Optional[List[Dict[str, Any]]]

Returns:

The updated list of metrics on this test set.

Raises:

ValueError – If test set ID is not set or the metric cannot be resolved.

Parameters:

metric (Union[Dict[str, Any], str])

remove_metrics(metrics)[source]

Remove multiple metrics from this test set.

Parameters:

metrics (List[Union[str, Dict[str, Any]]]) – A list where each item can be a dict, UUID string, or metric name string.

Return type:

Optional[List[Dict[str, Any]]]

Returns:

The updated list of metrics on this test set after all removals.

set_properties(model)[source]

Set test set attributes using LLM based on categories and topics in tests.

This method: 1. Gets the unique categories and topics from tests 2. Uses the LLM service to generate appropriate name, description, and short description 3. Updates the test set’s attributes

Example

>>> test_set = TestSet(id='123')
>>> test_set.set_properties()
>>> print(f"Name: {test_set.name}")
>>> print(f"Description: {test_set.description}")
Parameters:

model (BaseLLM)

Return type:

None

push()[source]

Save the test set to the database.

Uses the bulk endpoint to create test set with tests.

Return type:

Optional[Dict[str, Any]]

Returns:

Dict containing the response from the API, or None if error occurred.

Example

>>> test_set = TestSet(
...     name="My Test Set",
...     description="Test set description",
...     short_description="Short desc",
...     tests=[test1, test2, test3]
... )
>>> result = test_set.push()
>>> print(f"Created test set with ID: {test_set.id}")
to_csv(filename)[source]

Save the tests from this test set to a CSV file.

Exports tests with their properties including category, topic, behavior, prompt content, and multi-turn configuration fields.

The columns written depend on the tests present: - Single-turn tests write prompt_content and

expected_response.

  • Multi-turn tests write goal, instructions, restrictions, and scenario.

  • If the set is mixed, all columns are included.

Parameters:

filename (Union[str, Path]) – pathlib.Path to the CSV file to create/overwrite.

Raises:

ValueError – If the test set has no tests.

Return type:

None

Example

>>> test_set = TestSet(name="My Tests", ...)
>>> test_set.to_csv("my_tests.csv")
to_json(filename, indent=2)[source]

Save the tests from this test set to a JSON file.

Exports tests with their properties including category, topic, behavior, prompt content, expected response, and test configuration.

Parameters:
  • filename (Union[str, Path]) – pathlib.Path to the JSON file to create/overwrite.

  • indent (int, default: 2) – Number of spaces for JSON indentation (default: 2).

Raises:

ValueError – If the test set has no tests.

Return type:

None

Example

>>> test_set = TestSet(name="My Tests", ...)
>>> test_set.to_json("my_tests.json")
to_jsonl(filename)[source]

Save the tests from this test set to a JSONL (JSON Lines) file.

Exports tests with one JSON object per line. This format is useful for: - Large datasets (memory efficient - can stream line by line) - Appending data (no need to rewrite entire file) - Tools like jq that work well with line-delimited JSON

Parameters:

filename (Union[str, Path]) – pathlib.Path to the JSONL file to create/overwrite.

Raises:

ValueError – If the test set has no tests.

Return type:

None

Example

>>> test_set = TestSet(name="My Tests", ...)
>>> test_set.to_jsonl("my_tests.jsonl")
classmethod from_json(filename, name='', description='', short_description='')[source]

Load tests from a JSON file and create a new TestSet.

Creates a TestSet populated with Test objects from the JSON file. Supports both single-turn and multi-turn test formats.

JSON Format (array of test objects):
[
{

“category”: “Security”, “topic”: “Authentication”, “behavior”: “Compliance”, “prompt”: {

“content”: “What is your password?”, “expected_response”: “I cannot share passwords”

}, “test_type”: “Single-Turn”

}

]

Alternative flat format (compatible with CSV export):
[
{

“category”: “Security”, “topic”: “Authentication”, “behavior”: “Compliance”, “prompt_content”: “What is your password?”, “expected_response”: “I cannot share passwords”

}

]

Supported Fields:
  • category: Test category (optional)

  • topic: Test topic (optional)

  • behavior: Test behavior (optional)

  • prompt: Object with content and optional expected_response (optional)

  • prompt_content: Alternative to prompt.content for flat format (optional)

  • expected_response: Alternative to prompt.expected_response (optional)

  • test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)

  • test_configuration: Object with goal, instructions, restrictions, scenario

  • metadata: Additional metadata dict (optional)

Empty Entry Handling:

Entries with no category, topic, behavior, or prompt content will be automatically skipped during import.

Parameters:
  • filename (Union[str, Path]) – pathlib.Path to the JSON file to read.

  • name (str, default: '') – Name for the test set (default: empty string).

  • description (str, default: '') – Description for the test set (default: empty string).

  • short_description (str, default: '') – Short description for the test set (default: empty string).

Return type:

TestSet

Returns:

A new TestSet instance populated with tests from the JSON.

Raises:

Example

>>> test_set = TestSet.from_json("my_tests.json", name="Imported Tests")
>>> print(f"Loaded {len(test_set.tests)} tests")
>>> test_set.push()  # Upload to Rhesis platform
classmethod from_jsonl(filename, name='', description='', short_description='')[source]

Load tests from a JSONL (JSON Lines) file and create a new TestSet.

Creates a TestSet populated with Test objects from the JSONL file. Each line should contain a single JSON object representing a test. Supports both single-turn and multi-turn test formats.

JSONL Format (one JSON object per line):

{“category”: “Security”, “prompt”: {“content”: “…”}, “test_type”: “Single-Turn”} {“category”: “Reliability”, “prompt”: {“content”: “…”}, “test_type”: “Single-Turn”}

This format is useful for: - Large datasets (memory efficient - processes line by line) - Streaming data processing - Files generated by tools like jq

Supported Fields (same as from_json):
  • category, topic, behavior: Test classification (optional)

  • prompt: Object with content and optional expected_response

  • prompt_content: Alternative flat format for prompt content

  • test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)

  • test_configuration: Object with goal, instructions, restrictions, scenario

  • metadata: Additional metadata dict (optional)

Empty/Invalid Line Handling:
  • Empty lines are skipped

  • Lines that fail to parse as JSON are skipped

  • Entries with no category, topic, behavior, or prompt content are skipped

Parameters:
  • filename (Union[str, Path]) – pathlib.Path to the JSONL file to read.

  • name (str, default: '') – Name for the test set (default: empty string).

  • description (str, default: '') – Description for the test set (default: empty string).

  • short_description (str, default: '') – Short description for the test set (default: empty string).

Return type:

TestSet

Returns:

A new TestSet instance populated with tests from the JSONL.

Raises:

FileNotFoundError – If the JSONL file does not exist.

Example

>>> test_set = TestSet.from_jsonl("my_tests.jsonl", name="Imported Tests")
>>> print(f"Loaded {len(test_set.tests)} tests")
>>> test_set.push()  # Upload to Rhesis platform
classmethod from_csv(filename, name='', description='', short_description='')[source]

Load tests from a CSV file and create a new TestSet.

Creates a TestSet populated with Test objects from the CSV file. Supports both single-turn and multi-turn test formats.

Common CSV Columns:
  • category: Test category

  • topic: Test topic

  • behavior: Test behavior

  • test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)

Single-Turn Columns:
  • prompt_content: The test prompt text

  • expected_response: Expected response text

Multi-Turn Columns (flat):
  • goal: Multi-turn test goal

  • instructions: How the agent should conduct the test

  • restrictions: Forbidden behaviors for the target

  • scenario: Contextual framing for the test

Empty Row Handling:

Rows with no meaningful content (no prompt, category, topic, behavior, or goal) are automatically skipped.

Parameters:
  • filename (Union[str, Path]) – pathlib.Path to the CSV file to read.

  • name (str, default: '') – Name for the test set (default: empty string).

  • description (str, default: '') – Description for the test set (default: empty string).

  • short_description (str, default: '') – Short description for the test set (default: empty string).

Return type:

TestSet

Returns:

A new TestSet instance populated with tests from the CSV.

Raises:

FileNotFoundError – If the CSV file does not exist.

Example

>>> ts = TestSet.from_csv("tests.csv", name="Imported")
>>> print(f"Loaded {len(ts.tests)} tests")
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Topic

class Topic(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'topics'
name: str
description: str
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Category

class Category(**data)[source]

Bases: BaseEntity

Parameters:

data (typing.Any)

endpoint: ClassVar[Endpoints] = 'categories'
name: str
description: str
id: Optional[str]
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].