Steps Reference

Pipeline steps fall into two categories: LLM steps that call models, and transform steps that manipulate responses without LLM calls.

LLM Steps

Propose

Generate initial responses in parallel from multiple models.

Propose(agents, temp=0.7, max_tokens=2048)

Parameter	Type	Default	Description
`agents`	`list[str]`	required	Model names to query
`temp`	`float`	0.7	Sampling temperature
`max_tokens`	`int`	2048	Maximum response length

Behavior: Each agent receives the raw user query with no system prompt. All calls run in parallel.

# Diverse models
Propose(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "llama-3.3-70b"])

# Self-MoA: same model, multiple samples
Propose(["gpt-5-nano-2025-08-07"] * 6, temp=0.7)

Synthesize

Each agent synthesizes all previous responses into a new response.

Synthesize(agents, prompt=P_SYNTH, temp=0.7, max_tokens=2048)

Parameter	Type	Default	Description
`agents`	`list[str]`	required	Model names to query
`prompt`	`str`	synthesis prompt	System prompt for synthesis
`temp`	`float`	0.7	Sampling temperature
`max_tokens`	`int`	2048	Maximum response length

Behavior: Each agent sees all previous responses and the original query. Produces N new responses (one per agent).

# Standard MoA layer
Synthesize(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5", "llama-3.3-70b"])

Aggregate

Single model combines all responses into one final output.

Aggregate(agent, prompt=P_SYNTH, temp=0.7, max_tokens=2048)

Parameter	Type	Default	Description
`agent`	`str`	required	Model name
`prompt`	`str`	synthesis prompt	System prompt for aggregation
`temp`	`float`	0.7	Sampling temperature
`max_tokens`	`int`	2048	Maximum response length

Behavior: Reduces N responses to 1. Typically the final step.

Aggregate("gpt-5-nano-2025-08-07")

# Custom prompt
Aggregate("gpt-5-nano-2025-08-07", prompt="Select the best response and return it verbatim.")

Refine

Improve each response individually.

Refine(agents, prompt=P_REFINE, temp=0.7, max_tokens=2048)

Parameter	Type	Default	Description
`agents`	`list[str]`	required	Model names (cycled if fewer than responses)
`prompt`	`str`	refinement prompt	Template with `{text}` and `{query}` placeholders
`temp`	`float`	0.7	Sampling temperature
`max_tokens`	`int`	2048	Maximum response length

Behavior: Each response is refined independently. Agents are cycled if there are more responses than agents.

# Use GPT-5 Nano to refine all responses
Refine(["gpt-5-nano-2025-08-07"])

# Different refiners for each response
Refine(["gpt-5-nano-2025-08-07", "claude-sonnet-4-5"])

Rank

Select the top N responses by quality.

Rank(agent, n=3, prompt=P_RANK, temp=0.7, max_tokens=2048)

Parameter	Type	Default	Description
`agent`	`str`	required	Model to perform ranking
`n`	`int`	3	Number of responses to keep
`prompt`	`str`	ranking prompt	Template with `{query}`, `{responses}`, `{n}` placeholders
`temp`	`float`	0.7	Sampling temperature
`max_tokens`	`int`	2048	Maximum response length

Behavior: LLM returns comma-separated indices of best responses. Falls back to first N if parsing fails.

Rank("gpt-5-nano-2025-08-07", n=3)

Vote

Identify consensus or select the most accurate answer.

Vote(agent, prompt=P_VOTE, temp=0.7, max_tokens=2048)

Parameter	Type	Default	Description
`agent`	`str`	required	Model to perform voting
`prompt`	`str`	voting prompt	System prompt for consensus finding
`temp`	`float`	0.7	Sampling temperature
`max_tokens`	`int`	2048	Maximum response length

Behavior: Reduces N responses to 1 by finding consensus or selecting the best.

Vote("gpt-5-nano-2025-08-07")

Transform Steps

These steps manipulate responses without making LLM calls.

Shuffle

Randomize response order to prevent positional bias.

Shuffle()

Behavior: Randomly reorders responses. Some models exhibit position bias (favoring first or last responses).

Dropout

Randomly drop responses with a given probability.

Dropout(rate)

Parameter	Type	Description
`rate`	`float`	Probability of dropping each response (0.0–1.0)

Behavior: Each response is independently dropped with probability rate. If all responses would be dropped, one is kept randomly.

Dropout(0.2)  # 20% chance to drop each response

Sample

Take a random subset of responses.

Sample(n)

Parameter	Type	Description
`n`	`int`	Number of responses to sample

Behavior: Randomly selects n responses. If fewer than n exist, returns all.

Sample(3)  # Keep 3 random responses

Take

Keep the first N responses.

Take(n)

Parameter	Type	Description
`n`	`int`	Number of responses to keep

Behavior: Deterministic—always keeps the first n responses.

Take(3)  # Keep first 3 responses

Filter

Keep responses matching a predicate.

Filter(fn)

Parameter	Type	Description
`fn`	`Callable[[str], bool]`	Function that returns True for responses to keep

# Keep only responses mentioning "quantum"
Filter(lambda r: "quantum" in r.lower())

# Keep responses over 100 chars
Filter(lambda r: len(r) > 100)

Map

Transform each response.

Map(fn)

Parameter	Type	Description
`fn`	`Callable[[str], str]`	Function to apply to each response

# Truncate responses
Map(lambda r: r[:500])

# Strip whitespace
Map(str.strip)

Default Prompts

The library uses these default prompts:

Synthesis prompt (P_SYNTH):

You have been provided with responses from various models to a query.
Synthesize into a single, high-quality response.
Critically evaluate—some may be biased or incorrect.
Do not simply replicate; offer a refined, accurate reply.

Refinement prompt (P_REFINE):

Improve this response:

{text}

Original query: {query}

Voting prompt (P_VOTE):

These responses answer the same question.
Identify the consensus view shared by the majority.
If no clear consensus, select the most accurate answer.
Return only that answer, restated clearly.

Ranking prompt (P_RANK):

Rank these responses by quality for the query: '{query}'

{responses}

Return the top {n} as comma-separated numbers (e.g., '3, 1, 5').