Compare AI agent versions with experiments

Last updated: May 11, 2026 20:17

Agent Analytics experiments allow you to run comparisons on agent data to measure the impact of changes to your AI agent. You can compare performance between different agent configurations, models, or tools by analyzing conversation metrics side-by-side.

Use cases

You can use experiments to measure the impact of various agent modifications. Examples of what you can compare include:

Different AI models.
Specific tool or subagent usage.
Conversations with files uploaded versus no files uploaded.
Conversations with suggested prompts versus no suggested prompts.
Performance across different segments.

Create an experiment

Any user with access to the agent can create an experiment. You can create up to five groups in a single experiment and run multiple concurrent experiments for each agent.

In your agent, go to the Experiments tab.
Select + Create experiment.
Under What to compare, add a goal to describe what you're testing. Use a clear, measurable outcome, such as “Reduce issue rate compared to the previous agent version” or “Increase positive reactions with a new tool configuration.”
Define your control group. Use the baseline you want to compare against, such as your current agent configuration or a previous version of the agent.
Select + Add condition below the groups to add one or more conditions (like tool, model, segment, or files uploaded) to narrow the group scope.
Define one or more experiment groups using the same process, changing only the variables you want to test (for example, a new model, updated tools, or different suggested prompts). Match the segment and date range from your control group so you’re comparing versions over the same traffic and time period.
Select + Add group below the groups to add more variants to test against.
If needed, you can select Set as control to set a different group as the control.
Select a group name (for example, Group A or Group B) to rename it with something more descriptive, like "Model A" and "Model B".
Under What to track, select each metric and up to five tracked use cases or issues you want to compare.
Select Start experiment. This populates a new card in the Experiments list, where you can open it to review its results.

Interpret an experiment

Experiment results display in a side-by-side layout with separate columns for your control and experiment groups. Each group shows its conditions (such as date range and suggested prompts status) so you can confirm you’re comparing similar traffic before you interpret the results.

For each selected metric, you can view the raw value for each group and a percentage difference between them. The control group is labeled, and each comparison shows whether the experiment group increased or decreased relative to that baseline. If your experiment includes more than one experiment group, each group is compared against the same control so you can evaluate multiple variations at once.

Generally, higher values are better for engagement metrics (like positive reactions), while lower values are better for issue-related metrics (like rage prompt rate or unsupported request rate). For formal metric definitions and guidance on interpreting them, see Analyze interactions with AI agents.

Manage an experiment

The Experiments tab lists all your saved experiments, where each preview card lists the status, group names, goal, number of groups and conditions, creator, and creation date.

From this list, you can:

Search for experiments by name.
Filter by creator or status.
Select an experiment to view the results.
Hover over an experiment and select the delete (trash can) icon to remove an experiment you no longer need.

Experiments automatically complete when the end dates for all groups pass. You can also manually mark an experiment as complete before its scheduled end date. Open the experiment, select the In progress status, and choose Completed. Once completed, an AI-generated summary of the experiment's results appears in the Experiment overview section at the top of the experiment.

Use cases

Create an experiment

Interpret an experiment

Manage an experiment

Articles in this section