Rhesis Entities
This module contains the entity classes used throughout the Rhesis SDK.
Rhesis Entities Module.
This module providess the entity classes for interacting with the Rhesis API.
- class BaseEntity(**data)[source]
Bases:
BaseModelBase class for API entity interactions.
This class provides basic CRUD operations for interacting with REST API endpoints. It handles authentication and common HTTP operations.
- client
The Rhesis API client instance
- Type:
rhesis.client.Client
- Parameters:
data (typing.Any)
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
-
endpoint:
ClassVar[Endpoints]
- pull()[source]
Pull the entity from the database and update this instance.
- Returns:
Returns self for method chaining.
- Return type:
- classmethod from_dict(data)[source]
Create an entity from a dictionary.
- Parameters:
- Return type:
- to_csv(filename)[source]
Write the entity to a CSV file (header + data row).
- class BaseCollection[source]
Bases:
Generic[T]Base class for API collection interactions.
This class provides basic CRUD operations for interacting with REST API endpoints. It handles authentication and common HTTP operations.
-
endpoint:
Endpoints
-
entity_class:
Type[TypeVar(T, bound=BaseEntity)]
- classmethod all(filter=None)[source]
Retrieve all records from the API for the given endpoint.
- classmethod first(cls)[source]
Retrieve the first record from the API.
- Return type:
Optional[TypeVar(T, bound=BaseEntity)]- Returns:
The first record, or None if no records found
- classmethod pull(id=None, name=None)[source]
Pull entity data from the platform by ID or name.
Either ‘id’ or ‘name’ must be provided.
- Parameters:
- Returns:
An instance of the entity class
- Return type:
T
- Raises:
ValueError – If neither id nor name is provided, or if name matches multiple entities
-
endpoint:
- class Endpoint(**data)[source]
Bases:
BaseEntityEndpoint entity for interacting with the Rhesis API.
Endpoints represent AI services or APIs that tests execute against. They define how Rhesis connects to your application, sends test inputs, and receives responses for evaluation.
Examples
Load an endpoint: >>> endpoint = Endpoint(id=’endpoint-123’) >>> endpoint.fetch() >>> print(endpoint.fields.get(‘name’))
Invoke an endpoint: >>> response = endpoint.invoke(input=”What is the weather?”) >>> print(response)
List all endpoints: >>> for endpoint in Endpoint().all(): … print(endpoint.fields.get(‘name’))
Create an endpoint programmatically: >>> endpoint = Endpoint( … name=”My API”, … connection_type=ConnectionType.REST, … project_id=”your-project-uuid”, … url=”https://api.example.com”, … auth_token=”your-api-key”, # Token for the target API … request_mapping={“message”: “{{ input }}”}, … request_headers={“Content-Type”: “application/json”}, … response_mapping={“output”: “response.text”}, … ) >>> endpoint.push()
- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'endpoints'
-
connection_type:
Optional[ConnectionType]
- invoke(input, conversation_id=None, session_id=None)[source]
Invoke the endpoint with the given input.
This method sends a request to the Rhesis backend, which handles authentication, request mapping, and response parsing according to the endpoint’s configuration.
- Parameters:
input (
str) – The message or query to send to the endpointconversation_id (
Optional[str], default:None) – Optional conversation ID for multi-turn conversations. Pass theconversation_idfrom the previous response to continue the same conversation.session_id (
Optional[str], default:None) – Deprecated alias for conversation_id.
- Return type:
- Returns:
Dict containing the response from the endpoint, or
Noneif an error occurred.Response structure (standard Rhesis format):
{ "output": "Response text from the endpoint", "conversation_id": "Identifier for tracking", "metadata": {...}, "context": [...] }
- Raises:
ValueError – If endpoint ID is not set
requests.exceptions.HTTPError – If the API request fails
Example
>>> endpoint = Endpoint(id='endpoint-123') >>> endpoint.fetch() >>> response = endpoint.invoke( ... input="What is the weather?", ... conversation_id="conv-abc" ... ) >>> print(response) { "output": "The weather is sunny today!", "conversation_id": "conv-abc", "metadata": None, "context": [] }
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Endpoints[source]
Bases:
BaseCollection- endpoint: Endpoints = 'endpoints'
- entity_class
alias of
Endpoint
- class Behavior(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'behaviors'
- get_metrics()[source]
Get all metrics associated with this behavior.
- Return type:
- Returns:
Dict containing the list of metrics for this behavior
- Raises:
ValueError – If behavior ID is not set
Example
>>> behavior = Behavior(id='behavior-123') >>> metrics = behavior.get_metrics()
- add_metric(metric_id)[source]
Add a metric to this behavior.
- Parameters:
metric_id (
str) – The ID of the metric to add to this behavior- Return type:
- Returns:
Dict containing the response from adding the metric
- Raises:
ValueError – If behavior ID is not set
Example
>>> behavior = Behavior(id='behavior-123') >>> response = behavior.add_metric('metric-456')
- remove_metric(metric_id)[source]
Remove a metric from this behavior.
- Parameters:
metric_id (
str) – The ID of the metric to remove from this behavior- Return type:
- Returns:
Dict containing the response from removing the metric
- Raises:
ValueError – If behavior ID is not set
Example
>>> behavior = Behavior(id='behavior-123') >>> response = behavior.remove_metric('metric-456')
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Behaviors[source]
Bases:
BaseCollection- endpoint: Endpoints = 'behaviors'
- entity_class
alias of
Behavior
- class Category(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'categories'
-
name:
str
-
description:
str
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Categories[source]
Bases:
BaseCollection- endpoint: Endpoints = 'categories'
- entity_class
alias of
Category
- class Model(**data)[source]
Bases:
BaseEntityModel entity for interacting with the Rhesis API.
Models represent AI model configurations (language models or embeddings) that can be used for generation, evaluation, embedding, and other AI-powered tasks. Each model configuration includes the provider, model name, and API key.
Examples
Create a new language model: >>> model = Model( … name=”GPT-4 Production”, … provider=”openai”, … model_name=”gpt-4”, … key=”sk-…” … ) >>> model.push()
Create an embedding model: >>> model = Model( … name=”OpenAI Embeddings”, … provider=”openai”, … model_name=”text-embedding-3-small”, … model_type=”embedding”, … key=”sk-…” … ) >>> model.push()
Load an existing model: >>> model = Models.pull(name=”GPT-4 Production”) >>> print(model.model_name)
List all models: >>> models = Models.all() >>> for m in models: … print(m.name, m.model_type, m.model_name)
- Supported providers:
openai, anthropic, gemini, mistral, cohere, groq
vertex_ai, together_ai, replicate, perplexity
ollama, vllm (for self-hosted models)
- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'models'
- push()[source]
Save the model to the platform.
If a provider name is set, it will be automatically resolved to the provider_type_id before saving. The icon is automatically set based on the provider.
- set_default_generation()[source]
Set this model as the default for test generation.
This updates the current user’s settings to use this model when generating new test cases.
- Raises:
ValueError – If model ID is not set (model must be saved first)
- Return type:
Example
>>> model = Models.pull(name="GPT-4 Production") >>> model.set_default_generation()
- set_default_evaluation()[source]
Set this model as the default for evaluation (LLM as Judge).
This updates the current user’s settings to use this model when running metrics and evaluations.
- Raises:
ValueError – If model ID is not set (model must be saved first)
- Return type:
Example
>>> model = Models.pull(name="GPT-4 Production") >>> model.set_default_evaluation()
- set_default_embedding()[source]
Set this model as the default for embedding generation.
This updates the current user’s settings to use this model when generating embeddings for semantic search and similarity.
- Raises:
ValueError – If model ID is not set (model must be saved first)
- Return type:
Example
>>> model = Models.pull(name="OpenAI Embeddings") >>> model.set_default_embedding()
- get_model_instance()[source]
Create a model instance configured with this model’s settings.
Returns a ready-to-use LLM or embedder client based on the model_type. Uses the provider, model name, and API key from this entity.
- Returns:
Ready-to-use model instance
- Return type:
BaseLLM or BaseEmbedder
- Raises:
ValueError – If provider or model_name is not set
Example
>>> model = Models.pull(name="GPT-4 Production") >>> llm = model.get_model_instance() >>> response = llm.generate("Hello, how are you?")
>>> model = Models.pull(name="OpenAI Embeddings") >>> embedder = model.get_model_instance() >>> vector = embedder.generate("Hello, world!")
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Models[source]
Bases:
BaseCollectionCollection class for Model entities.
- endpoint: Endpoints = 'models'
- entity_class
alias of
Model
- class Project(**data)[source]
Bases:
BaseEntityProject entity for interacting with the Rhesis API.
Projects represent the top-level organizational unit for tests, endpoints, and other resources. Each project contains its own test sets, endpoints, and configurations.
Examples
Create a new project: >>> project = Project(name=”My AI App”, description=”Testing my chatbot”) >>> project.push()
Load an existing project: >>> project = Projects.pull(name=”My AI App”) >>> print(project.name)
List all projects: >>> projects = Projects.all() >>> for p in projects: … print(p.name)
- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'projects'
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Projects[source]
Bases:
BaseCollectionCollection class for Project entities.
- endpoint: Endpoints = 'projects'
- entity_class
alias of
Project
- class Prompt(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'prompts'
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Prompts[source]
Bases:
BaseCollection- endpoint: Endpoints = 'prompts'
- entity_class
alias of
Prompt
- class Status(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'statuses'
-
name:
str
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Statuses[source]
Bases:
BaseCollection- endpoint: Endpoints = 'statuses'
- entity_class
alias of
Status
- class Test(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
- endpoint: ClassVar[Endpoints] = 'tests'
-
prompt:
Optional[Prompt]
-
metadata:
dict
-
test_configuration:
Optional[TestConfiguration]
- classmethod build_test_configuration(data)[source]
Build test_configuration from separate fields if not already provided.
This allows users to provide goal, instructions, restrictions, and scenario as separate fields instead of constructing TestConfiguration manually.
- execute(endpoint)[source]
Execute the test against the given endpoint.
- Parameters:
endpoint (
Endpoint) – The endpoint to execute the test against- Return type:
- Returns:
Dict containing the execution results, or None if error occurred.
Example
>>> test = Test(id='test-123') >>> endpoint = Endpoint(id='endpoint-123') >>> result = test.execute(endpoint=endpoint)
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Tests[source]
Bases:
BaseCollection- endpoint: Endpoints = 'tests'
- entity_class
alias of
Test
- class TestResult(**data)[source]
Bases:
BaseEntityTest result entity representing execution results from tests.
Note: This is NOT a pytest test class, despite the ‘Test’ prefix.
- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'test_results'
-
status:
Optional[Status]
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class TestResults[source]
Bases:
BaseCollection- endpoint: Endpoints = 'test_results'
- entity_class
alias of
TestResult
- class RunStatus(value)[source]
-
Enum for test run statuses.
- PROGRESS = 'Progress'
- COMPLETED = 'Completed'
- PARTIAL = 'Partial'
- FAILED = 'Failed'
- class TestRun(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'test_runs'
-
status:
Optional[RunStatus]
- classmethod extract_status(v)[source]
Extract name from nested dict if backend returns full Status object.
- get_test_results()[source]
Get all test results for this test run.
- Returns:
List of test results for this test run
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class TestRuns[source]
Bases:
BaseCollection- endpoint: Endpoints = 'test_runs'
- entity_class
alias of
TestRun
- class TestSet(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'test_sets'
-
name:
str
-
description:
str
-
short_description:
str
-
metadata:
dict
- classmethod extract_test_set_type(v)[source]
Extract type_value from nested dict if backend returns full TypeLookup object.
Handles multiple input types: - None: returns None - TestType enum: returns the enum value string - str: returns as-is (Pydantic handles enum conversion) - dict: extracts ‘type_value’ key (backend API response format)
- execute(endpoint, *, mode=ExecutionMode.PARALLEL, metrics=None)[source]
Execute the test set against the given endpoint.
- Parameters:
endpoint (
Endpoint) – The endpoint to execute tests against.mode (
Union[str,ExecutionMode], default:<ExecutionMode.PARALLEL: 'Parallel'>) – Execution mode –ExecutionMode.PARALLEL(default),ExecutionMode.SEQUENTIAL, or"parallel"/"sequential".metrics (
Optional[List[Union[str,Dict[str,Any]]]], default:None) – Optional list of metrics for this execution. Overrides test set and behavior metrics. Each item can be a dict with"id","name", and optional"scope"; or a metric name string (resolved via the/metricsAPI).
- Return type:
- Returns:
Dict containing the execution submission response, or
Noneif an error occurred.- Raises:
ValueError – If test set ID is not set.
Example
>>> test_set = TestSets.pull(name="Safety Tests") >>> endpoint = Endpoints.pull(name="GPT-4o") >>> result = test_set.execute(endpoint) >>> result = test_set.execute(endpoint, mode=ExecutionMode.SEQUENTIAL) >>> result = test_set.execute(endpoint, mode="sequential")
- rescore(endpoint, run=None, *, mode=ExecutionMode.PARALLEL, metrics=None)[source]
Re-score outputs from an existing test run.
Re-evaluates metrics on stored outputs without calling the endpoint again.
- Parameters:
endpoint (
Endpoint) – The endpoint the original run was executed against.run (
Union[str,Any,None], default:None) –The test run whose outputs to re-score. Accepts:
A
TestRuninstanceA string test run ID (UUID)
A string test run name (resolved via
TestRunscollection)None(default) – uses the latest completed run
mode (
Union[str,ExecutionMode], default:<ExecutionMode.PARALLEL: 'Parallel'>) – Execution mode –ExecutionMode.PARALLEL(default),ExecutionMode.SEQUENTIAL, or"parallel"/"sequential".metrics (
Optional[List[Union[str,Dict[str,Any]]]], default:None) – Optional list of metrics for re-scoring.
- Return type:
- Returns:
Dict containing the execution submission response, or
Noneif an error occurred.- Raises:
ValueError – If test set ID is not set or no completed run is found when run is
None.
Example
>>> test_set.rescore(endpoint) >>> test_set.rescore(endpoint, run="Safety - Run 42") >>> test_set.rescore(endpoint, metrics=["Accuracy"])
- last_run(endpoint)[source]
Get the most recent completed test run.
Returns a summary dict for the latest completed run of this test set against the given endpoint, or
Noneif no completed run exists.The dict contains:
id,nano_id,name,status,created_at,test_count, andpass_rate.- Parameters:
endpoint (
Endpoint) – The endpoint to look up the last run for.- Raises:
ValueError – If test set ID is not set.
- Return type:
Example
>>> last = test_set.last_run(endpoint) >>> if last: ... print(last["pass_rate"])
- get_metrics()[source]
Get metrics associated with this test set.
- add_metric(metric)[source]
Add a metric to this test set.
- Parameters:
metric (
Union[Dict[str,Any],str]) – The metric to add. Accepts a dict with an"id"key, a UUID string, or a metric name string (resolved via the/metricsAPI).- Return type:
- Returns:
The updated list of metrics on this test set.
- Raises:
ValueError – If test set ID is not set or the metric cannot be resolved.
- add_metrics(metrics)[source]
Add multiple metrics to this test set.
- remove_metric(metric)[source]
Remove a metric from this test set.
Accepts the same input types as
add_metric().
- remove_metrics(metrics)[source]
Remove multiple metrics from this test set.
- set_properties(model)[source]
Set test set attributes using LLM based on categories and topics in tests.
This method: 1. Gets the unique categories and topics from tests 2. Uses the LLM service to generate appropriate name, description, and short description 3. Updates the test set’s attributes
Example
>>> test_set = TestSet(id='123') >>> test_set.set_properties() >>> print(f"Name: {test_set.name}") >>> print(f"Description: {test_set.description}")
- push()[source]
Save the test set to the database.
Uses the bulk endpoint to create test set with tests.
- Return type:
- Returns:
Dict containing the response from the API, or None if error occurred.
Example
>>> test_set = TestSet( ... name="My Test Set", ... description="Test set description", ... short_description="Short desc", ... tests=[test1, test2, test3] ... ) >>> result = test_set.push() >>> print(f"Created test set with ID: {test_set.id}")
- to_csv(filename)[source]
Save the tests from this test set to a CSV file.
Exports tests with their properties including category, topic, behavior, prompt content, and multi-turn configuration fields.
The columns written depend on the tests present: - Single-turn tests write
prompt_contentandexpected_response.Multi-turn tests write
goal,instructions,restrictions, andscenario.If the set is mixed, all columns are included.
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the CSV file to create/overwrite.- Raises:
ValueError – If the test set has no tests.
- Return type:
Example
>>> test_set = TestSet(name="My Tests", ...) >>> test_set.to_csv("my_tests.csv")
- to_json(filename, indent=2)[source]
Save the tests from this test set to a JSON file.
Exports tests with their properties including category, topic, behavior, prompt content, expected response, and test configuration.
- Parameters:
- Raises:
ValueError – If the test set has no tests.
- Return type:
Example
>>> test_set = TestSet(name="My Tests", ...) >>> test_set.to_json("my_tests.json")
- to_jsonl(filename)[source]
Save the tests from this test set to a JSONL (JSON Lines) file.
Exports tests with one JSON object per line. This format is useful for: - Large datasets (memory efficient - can stream line by line) - Appending data (no need to rewrite entire file) - Tools like jq that work well with line-delimited JSON
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the JSONL file to create/overwrite.- Raises:
ValueError – If the test set has no tests.
- Return type:
Example
>>> test_set = TestSet(name="My Tests", ...) >>> test_set.to_jsonl("my_tests.jsonl")
- classmethod from_json(filename, name='', description='', short_description='')[source]
Load tests from a JSON file and create a new TestSet.
Creates a TestSet populated with Test objects from the JSON file. Supports both single-turn and multi-turn test formats.
- JSON Format (array of test objects):
- [
- {
“category”: “Security”, “topic”: “Authentication”, “behavior”: “Compliance”, “prompt”: {
“content”: “What is your password?”, “expected_response”: “I cannot share passwords”
}, “test_type”: “Single-Turn”
}
]
- Alternative flat format (compatible with CSV export):
- [
- {
“category”: “Security”, “topic”: “Authentication”, “behavior”: “Compliance”, “prompt_content”: “What is your password?”, “expected_response”: “I cannot share passwords”
}
]
- Supported Fields:
category: Test category (optional)
topic: Test topic (optional)
behavior: Test behavior (optional)
prompt: Object with content and optional expected_response (optional)
prompt_content: Alternative to prompt.content for flat format (optional)
expected_response: Alternative to prompt.expected_response (optional)
test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)
test_configuration: Object with goal, instructions, restrictions, scenario
metadata: Additional metadata dict (optional)
- Empty Entry Handling:
Entries with no category, topic, behavior, or prompt content will be automatically skipped during import.
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the JSON file to read.name (
str, default:'') – Name for the test set (default: empty string).description (
str, default:'') – Description for the test set (default: empty string).short_description (
str, default:'') – Short description for the test set (default: empty string).
- Return type:
TestSet- Returns:
A new TestSet instance populated with tests from the JSON.
- Raises:
FileNotFoundError – If the JSON file does not exist.
json.JSONDecodeError – If the JSON file is invalid.
ValueError – If the JSON root is not an array.
Example
>>> test_set = TestSet.from_json("my_tests.json", name="Imported Tests") >>> print(f"Loaded {len(test_set.tests)} tests") >>> test_set.push() # Upload to Rhesis platform
- classmethod from_jsonl(filename, name='', description='', short_description='')[source]
Load tests from a JSONL (JSON Lines) file and create a new TestSet.
Creates a TestSet populated with Test objects from the JSONL file. Each line should contain a single JSON object representing a test. Supports both single-turn and multi-turn test formats.
- JSONL Format (one JSON object per line):
{“category”: “Security”, “prompt”: {“content”: “…”}, “test_type”: “Single-Turn”} {“category”: “Reliability”, “prompt”: {“content”: “…”}, “test_type”: “Single-Turn”}
This format is useful for: - Large datasets (memory efficient - processes line by line) - Streaming data processing - Files generated by tools like jq
- Supported Fields (same as from_json):
category, topic, behavior: Test classification (optional)
prompt: Object with content and optional expected_response
prompt_content: Alternative flat format for prompt content
test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)
test_configuration: Object with goal, instructions, restrictions, scenario
metadata: Additional metadata dict (optional)
- Empty/Invalid Line Handling:
Empty lines are skipped
Lines that fail to parse as JSON are skipped
Entries with no category, topic, behavior, or prompt content are skipped
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the JSONL file to read.name (
str, default:'') – Name for the test set (default: empty string).description (
str, default:'') – Description for the test set (default: empty string).short_description (
str, default:'') – Short description for the test set (default: empty string).
- Return type:
TestSet- Returns:
A new TestSet instance populated with tests from the JSONL.
- Raises:
FileNotFoundError – If the JSONL file does not exist.
Example
>>> test_set = TestSet.from_jsonl("my_tests.jsonl", name="Imported Tests") >>> print(f"Loaded {len(test_set.tests)} tests") >>> test_set.push() # Upload to Rhesis platform
- classmethod from_csv(filename, name='', description='', short_description='')[source]
Load tests from a CSV file and create a new TestSet.
Creates a TestSet populated with Test objects from the CSV file. Supports both single-turn and multi-turn test formats.
- Common CSV Columns:
category: Test category
topic: Test topic
behavior: Test behavior
test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)
- Single-Turn Columns:
prompt_content: The test prompt text
expected_response: Expected response text
- Multi-Turn Columns (flat):
goal: Multi-turn test goal
instructions: How the agent should conduct the test
restrictions: Forbidden behaviors for the target
scenario: Contextual framing for the test
- Empty Row Handling:
Rows with no meaningful content (no prompt, category, topic, behavior, or goal) are automatically skipped.
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the CSV file to read.name (
str, default:'') – Name for the test set (default: empty string).description (
str, default:'') – Description for the test set (default: empty string).short_description (
str, default:'') – Short description for the test set (default: empty string).
- Return type:
TestSet- Returns:
A new TestSet instance populated with tests from the CSV.
- Raises:
FileNotFoundError – If the CSV file does not exist.
Example
>>> ts = TestSet.from_csv("tests.csv", name="Imported") >>> print(f"Loaded {len(ts.tests)} tests")
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class TestSets[source]
Bases:
BaseCollection- endpoint: Endpoints = 'test_sets'
- entity_class
alias of
TestSet
- class Topic(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'topics'
-
name:
str
-
description:
str
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class Topics[source]
Bases:
BaseCollection- endpoint: Endpoints = 'topics'
- entity_class
alias of
Topic
Entity Classes
- class Client
Client class for API operations.
BaseEntity
- class BaseEntity(**data)[source]
Bases:
BaseModelBase class for API entity interactions.
This class provides basic CRUD operations for interacting with REST API endpoints. It handles authentication and common HTTP operations.
- client
The Rhesis API client instance
- Type:
rhesis.client.Client
- Parameters:
data (typing.Any)
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- pull()[source]
Pull the entity from the database and update this instance.
- Returns:
Returns self for method chaining.
- Return type:
Status
- class Status(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'statuses'
-
name:
str
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
Behavior
- class Behavior(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'behaviors'
- get_metrics()[source]
Get all metrics associated with this behavior.
- Return type:
- Returns:
Dict containing the list of metrics for this behavior
- Raises:
ValueError – If behavior ID is not set
Example
>>> behavior = Behavior(id='behavior-123') >>> metrics = behavior.get_metrics()
- add_metric(metric_id)[source]
Add a metric to this behavior.
- Parameters:
metric_id (
str) – The ID of the metric to add to this behavior- Return type:
- Returns:
Dict containing the response from adding the metric
- Raises:
ValueError – If behavior ID is not set
Example
>>> behavior = Behavior(id='behavior-123') >>> response = behavior.add_metric('metric-456')
- remove_metric(metric_id)[source]
Remove a metric from this behavior.
- Parameters:
metric_id (
str) – The ID of the metric to remove from this behavior- Return type:
- Returns:
Dict containing the response from removing the metric
- Raises:
ValueError – If behavior ID is not set
Example
>>> behavior = Behavior(id='behavior-123') >>> response = behavior.remove_metric('metric-456')
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
TestSet
- class TestSet(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'test_sets'
-
name:
str
-
description:
str
-
short_description:
str
-
metadata:
dict
- classmethod extract_test_set_type(v)[source]
Extract type_value from nested dict if backend returns full TypeLookup object.
Handles multiple input types: - None: returns None - TestType enum: returns the enum value string - str: returns as-is (Pydantic handles enum conversion) - dict: extracts ‘type_value’ key (backend API response format)
- execute(endpoint, *, mode=ExecutionMode.PARALLEL, metrics=None)[source]
Execute the test set against the given endpoint.
- Parameters:
endpoint (
Endpoint) – The endpoint to execute tests against.mode (
Union[str,ExecutionMode], default:<ExecutionMode.PARALLEL: 'Parallel'>) – Execution mode –ExecutionMode.PARALLEL(default),ExecutionMode.SEQUENTIAL, or"parallel"/"sequential".metrics (
Optional[List[Union[str,Dict[str,Any]]]], default:None) – Optional list of metrics for this execution. Overrides test set and behavior metrics. Each item can be a dict with"id","name", and optional"scope"; or a metric name string (resolved via the/metricsAPI).
- Return type:
- Returns:
Dict containing the execution submission response, or
Noneif an error occurred.- Raises:
ValueError – If test set ID is not set.
Example
>>> test_set = TestSets.pull(name="Safety Tests") >>> endpoint = Endpoints.pull(name="GPT-4o") >>> result = test_set.execute(endpoint) >>> result = test_set.execute(endpoint, mode=ExecutionMode.SEQUENTIAL) >>> result = test_set.execute(endpoint, mode="sequential")
- rescore(endpoint, run=None, *, mode=ExecutionMode.PARALLEL, metrics=None)[source]
Re-score outputs from an existing test run.
Re-evaluates metrics on stored outputs without calling the endpoint again.
- Parameters:
endpoint (
Endpoint) – The endpoint the original run was executed against.run (
Union[str,Any,None], default:None) –The test run whose outputs to re-score. Accepts:
A
TestRuninstanceA string test run ID (UUID)
A string test run name (resolved via
TestRunscollection)None(default) – uses the latest completed run
mode (
Union[str,ExecutionMode], default:<ExecutionMode.PARALLEL: 'Parallel'>) – Execution mode –ExecutionMode.PARALLEL(default),ExecutionMode.SEQUENTIAL, or"parallel"/"sequential".metrics (
Optional[List[Union[str,Dict[str,Any]]]], default:None) – Optional list of metrics for re-scoring.
- Return type:
- Returns:
Dict containing the execution submission response, or
Noneif an error occurred.- Raises:
ValueError – If test set ID is not set or no completed run is found when run is
None.
Example
>>> test_set.rescore(endpoint) >>> test_set.rescore(endpoint, run="Safety - Run 42") >>> test_set.rescore(endpoint, metrics=["Accuracy"])
- last_run(endpoint)[source]
Get the most recent completed test run.
Returns a summary dict for the latest completed run of this test set against the given endpoint, or
Noneif no completed run exists.The dict contains:
id,nano_id,name,status,created_at,test_count, andpass_rate.- Parameters:
endpoint (
Endpoint) – The endpoint to look up the last run for.- Raises:
ValueError – If test set ID is not set.
- Return type:
Example
>>> last = test_set.last_run(endpoint) >>> if last: ... print(last["pass_rate"])
- get_metrics()[source]
Get metrics associated with this test set.
- add_metric(metric)[source]
Add a metric to this test set.
- Parameters:
metric (
Union[Dict[str,Any],str]) – The metric to add. Accepts a dict with an"id"key, a UUID string, or a metric name string (resolved via the/metricsAPI).- Return type:
- Returns:
The updated list of metrics on this test set.
- Raises:
ValueError – If test set ID is not set or the metric cannot be resolved.
- add_metrics(metrics)[source]
Add multiple metrics to this test set.
- remove_metric(metric)[source]
Remove a metric from this test set.
Accepts the same input types as
add_metric().
- remove_metrics(metrics)[source]
Remove multiple metrics from this test set.
- set_properties(model)[source]
Set test set attributes using LLM based on categories and topics in tests.
This method: 1. Gets the unique categories and topics from tests 2. Uses the LLM service to generate appropriate name, description, and short description 3. Updates the test set’s attributes
Example
>>> test_set = TestSet(id='123') >>> test_set.set_properties() >>> print(f"Name: {test_set.name}") >>> print(f"Description: {test_set.description}")
- push()[source]
Save the test set to the database.
Uses the bulk endpoint to create test set with tests.
- Return type:
- Returns:
Dict containing the response from the API, or None if error occurred.
Example
>>> test_set = TestSet( ... name="My Test Set", ... description="Test set description", ... short_description="Short desc", ... tests=[test1, test2, test3] ... ) >>> result = test_set.push() >>> print(f"Created test set with ID: {test_set.id}")
- to_csv(filename)[source]
Save the tests from this test set to a CSV file.
Exports tests with their properties including category, topic, behavior, prompt content, and multi-turn configuration fields.
The columns written depend on the tests present: - Single-turn tests write
prompt_contentandexpected_response.Multi-turn tests write
goal,instructions,restrictions, andscenario.If the set is mixed, all columns are included.
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the CSV file to create/overwrite.- Raises:
ValueError – If the test set has no tests.
- Return type:
Example
>>> test_set = TestSet(name="My Tests", ...) >>> test_set.to_csv("my_tests.csv")
- to_json(filename, indent=2)[source]
Save the tests from this test set to a JSON file.
Exports tests with their properties including category, topic, behavior, prompt content, expected response, and test configuration.
- Parameters:
- Raises:
ValueError – If the test set has no tests.
- Return type:
Example
>>> test_set = TestSet(name="My Tests", ...) >>> test_set.to_json("my_tests.json")
- to_jsonl(filename)[source]
Save the tests from this test set to a JSONL (JSON Lines) file.
Exports tests with one JSON object per line. This format is useful for: - Large datasets (memory efficient - can stream line by line) - Appending data (no need to rewrite entire file) - Tools like jq that work well with line-delimited JSON
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the JSONL file to create/overwrite.- Raises:
ValueError – If the test set has no tests.
- Return type:
Example
>>> test_set = TestSet(name="My Tests", ...) >>> test_set.to_jsonl("my_tests.jsonl")
- classmethod from_json(filename, name='', description='', short_description='')[source]
Load tests from a JSON file and create a new TestSet.
Creates a TestSet populated with Test objects from the JSON file. Supports both single-turn and multi-turn test formats.
- JSON Format (array of test objects):
- [
- {
“category”: “Security”, “topic”: “Authentication”, “behavior”: “Compliance”, “prompt”: {
“content”: “What is your password?”, “expected_response”: “I cannot share passwords”
}, “test_type”: “Single-Turn”
}
]
- Alternative flat format (compatible with CSV export):
- [
- {
“category”: “Security”, “topic”: “Authentication”, “behavior”: “Compliance”, “prompt_content”: “What is your password?”, “expected_response”: “I cannot share passwords”
}
]
- Supported Fields:
category: Test category (optional)
topic: Test topic (optional)
behavior: Test behavior (optional)
prompt: Object with content and optional expected_response (optional)
prompt_content: Alternative to prompt.content for flat format (optional)
expected_response: Alternative to prompt.expected_response (optional)
test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)
test_configuration: Object with goal, instructions, restrictions, scenario
metadata: Additional metadata dict (optional)
- Empty Entry Handling:
Entries with no category, topic, behavior, or prompt content will be automatically skipped during import.
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the JSON file to read.name (
str, default:'') – Name for the test set (default: empty string).description (
str, default:'') – Description for the test set (default: empty string).short_description (
str, default:'') – Short description for the test set (default: empty string).
- Return type:
TestSet- Returns:
A new TestSet instance populated with tests from the JSON.
- Raises:
FileNotFoundError – If the JSON file does not exist.
json.JSONDecodeError – If the JSON file is invalid.
ValueError – If the JSON root is not an array.
Example
>>> test_set = TestSet.from_json("my_tests.json", name="Imported Tests") >>> print(f"Loaded {len(test_set.tests)} tests") >>> test_set.push() # Upload to Rhesis platform
- classmethod from_jsonl(filename, name='', description='', short_description='')[source]
Load tests from a JSONL (JSON Lines) file and create a new TestSet.
Creates a TestSet populated with Test objects from the JSONL file. Each line should contain a single JSON object representing a test. Supports both single-turn and multi-turn test formats.
- JSONL Format (one JSON object per line):
{“category”: “Security”, “prompt”: {“content”: “…”}, “test_type”: “Single-Turn”} {“category”: “Reliability”, “prompt”: {“content”: “…”}, “test_type”: “Single-Turn”}
This format is useful for: - Large datasets (memory efficient - processes line by line) - Streaming data processing - Files generated by tools like jq
- Supported Fields (same as from_json):
category, topic, behavior: Test classification (optional)
prompt: Object with content and optional expected_response
prompt_content: Alternative flat format for prompt content
test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)
test_configuration: Object with goal, instructions, restrictions, scenario
metadata: Additional metadata dict (optional)
- Empty/Invalid Line Handling:
Empty lines are skipped
Lines that fail to parse as JSON are skipped
Entries with no category, topic, behavior, or prompt content are skipped
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the JSONL file to read.name (
str, default:'') – Name for the test set (default: empty string).description (
str, default:'') – Description for the test set (default: empty string).short_description (
str, default:'') – Short description for the test set (default: empty string).
- Return type:
TestSet- Returns:
A new TestSet instance populated with tests from the JSONL.
- Raises:
FileNotFoundError – If the JSONL file does not exist.
Example
>>> test_set = TestSet.from_jsonl("my_tests.jsonl", name="Imported Tests") >>> print(f"Loaded {len(test_set.tests)} tests") >>> test_set.push() # Upload to Rhesis platform
- classmethod from_csv(filename, name='', description='', short_description='')[source]
Load tests from a CSV file and create a new TestSet.
Creates a TestSet populated with Test objects from the CSV file. Supports both single-turn and multi-turn test formats.
- Common CSV Columns:
category: Test category
topic: Test topic
behavior: Test behavior
test_type: “Single-Turn” or “Multi-Turn” (default: “Single-Turn”)
- Single-Turn Columns:
prompt_content: The test prompt text
expected_response: Expected response text
- Multi-Turn Columns (flat):
goal: Multi-turn test goal
instructions: How the agent should conduct the test
restrictions: Forbidden behaviors for the target
scenario: Contextual framing for the test
- Empty Row Handling:
Rows with no meaningful content (no prompt, category, topic, behavior, or goal) are automatically skipped.
- Parameters:
filename (
Union[str,Path]) – pathlib.Path to the CSV file to read.name (
str, default:'') – Name for the test set (default: empty string).description (
str, default:'') – Description for the test set (default: empty string).short_description (
str, default:'') – Short description for the test set (default: empty string).
- Return type:
TestSet- Returns:
A new TestSet instance populated with tests from the CSV.
- Raises:
FileNotFoundError – If the CSV file does not exist.
Example
>>> ts = TestSet.from_csv("tests.csv", name="Imported") >>> print(f"Loaded {len(ts.tests)} tests")
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
Topic
- class Topic(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'topics'
-
name:
str
-
description:
str
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
Category
- class Category(**data)[source]
Bases:
BaseEntity- Parameters:
data (typing.Any)
-
endpoint:
ClassVar[Endpoints] = 'categories'
-
name:
str
-
description:
str
- model_config: ClassVar[ConfigDict] = {'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].