Steps Reference
Pipeline steps fall into two categories: LLM steps that call models, and transform steps that manipulate responses without LLM calls.
LLM Steps
Propose
Generate initial responses in parallel from multiple models.
| Parameter | Type | Default | Description |
|---|---|---|---|
agents |
list[str] |
required | Model names to query |
temp |
float |
0.7 | Sampling temperature |
max_tokens |
int |
2048 | Maximum response length |
Behavior: Each agent receives the raw user query with no system prompt. All calls run in parallel.
# Diverse models
Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "llama-3.3-70b"])
# Self-MoA: same model, multiple samples
Propose(["gpt-5-nano-2025-08-07"] * 6, temp=0.7)
Synthesize
Each agent synthesizes all previous responses into a new response.
| Parameter | Type | Default | Description |
|---|---|---|---|
agents |
list[str] |
required | Model names to query |
prompt |
str |
synthesis prompt | System prompt for synthesis |
temp |
float |
0.7 | Sampling temperature |
max_tokens |
int |
2048 | Maximum response length |
Behavior: Each agent sees all previous responses and the original query. Produces N new responses (one per agent).
Aggregate
Single model combines all responses into one final output.
| Parameter | Type | Default | Description |
|---|---|---|---|
agent |
str |
required | Model name |
prompt |
str |
synthesis prompt | System prompt for aggregation |
temp |
float |
0.7 | Sampling temperature |
max_tokens |
int |
2048 | Maximum response length |
Behavior: Reduces N responses to 1. Typically the final step.
Aggregate("gpt-5-nano-2025-08-07")
# Custom prompt
Aggregate("gpt-5-nano-2025-08-07", prompt="Select the best response and return it verbatim.")
Refine
Improve each response individually.
| Parameter | Type | Default | Description |
|---|---|---|---|
agents |
list[str] |
required | Model names (cycled if fewer than responses) |
prompt |
str |
refinement prompt | Template with {text} and {query} placeholders |
temp |
float |
0.7 | Sampling temperature |
max_tokens |
int |
2048 | Maximum response length |
Behavior: Each response is refined independently. Agents are cycled if there are more responses than agents.
# Use GPT-5 Nano to refine all responses
Refine(["gpt-5-nano-2025-08-07"])
# Different refiners for each response
Refine(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5"])
Rank
Select the top N responses by quality.
| Parameter | Type | Default | Description |
|---|---|---|---|
agent |
str |
required | Model to perform ranking |
n |
int |
3 | Number of responses to keep |
prompt |
str |
ranking prompt | Template with {query}, {responses}, {n} placeholders |
temp |
float |
0.7 | Sampling temperature |
max_tokens |
int |
2048 | Maximum response length |
Behavior: LLM returns comma-separated indices of best responses. Falls back to first N if parsing fails.
Vote
Identify consensus or select the most accurate answer.
| Parameter | Type | Default | Description |
|---|---|---|---|
agent |
str |
required | Model to perform voting |
prompt |
str |
voting prompt | System prompt for consensus finding |
temp |
float |
0.7 | Sampling temperature |
max_tokens |
int |
2048 | Maximum response length |
Behavior: Reduces N responses to 1 by finding consensus or selecting the best.
Transform Steps
These steps manipulate responses without making LLM calls.
Shuffle
Randomize response order to prevent positional bias.
Behavior: Randomly reorders responses. Some models exhibit position bias (favoring first or last responses).
Dropout
Randomly drop responses with a given probability.
| Parameter | Type | Description |
|---|---|---|
rate |
float |
Probability of dropping each response (0.0–1.0) |
Behavior: Each response is independently dropped with probability rate. If all responses would be dropped, one is kept randomly.
Sample
Take a random subset of responses.
| Parameter | Type | Description |
|---|---|---|
n |
int |
Number of responses to sample |
Behavior: Randomly selects n responses. If fewer than n exist, returns all.
Take
Keep the first N responses.
| Parameter | Type | Description |
|---|---|---|
n |
int |
Number of responses to keep |
Behavior: Deterministic—always keeps the first n responses.
Filter
Keep responses matching a predicate.
| Parameter | Type | Description |
|---|---|---|
fn |
Callable[[str], bool] |
Function that returns True for responses to keep |
# Keep only responses mentioning "quantum"
Filter(lambda r: "quantum" in r.lower())
# Keep responses over 100 chars
Filter(lambda r: len(r) > 100)
Map
Transform each response.
| Parameter | Type | Description |
|---|---|---|
fn |
Callable[[str], str] |
Function to apply to each response |
Default Prompts
The library uses these default prompts:
Synthesis prompt (P_SYNTH):
You have been provided with responses from various models to a query.
Synthesize into a single, high-quality response.
Critically evaluate—some may be biased or incorrect.
Do not simply replicate; offer a refined, accurate reply.
Refinement prompt (P_REFINE):
Voting prompt (P_VOTE):
These responses answer the same question.
Identify the consensus view shared by the majority.
If no clear consensus, select the most accurate answer.
Return only that answer, restated clearly.
Ranking prompt (P_RANK):