Skip to main content
This page details the logic used to transform raw user input into an augmented prompt for the LLM, and the subsequent orchestration of the streaming execution pipeline. This process involves memory retrieval, session history management, and the coordination of tool execution.

The prepare_model_provider_input Pipeline

Before a request is sent to a Model Provider, the daemon executes a multi-stage augmentation pipeline. This is primarily handled by the prepare_model_provider_input function, which coordinates memory ingestion, session history compaction, and context retrieval.

Pipeline Stages

  1. Memory Ingest: Incoming user messages are ingested into the JournalStore as memory items for future retrieval crates/palyra-daemon/src/application/provider_input.rs#168-171.
  2. Session Compaction: If the session history (tape) grows too large, the pipeline triggers a compaction strategy. It builds a plan to summarize older parts of the conversation to stay within model context limits crates/palyra-daemon/src/application/provider_input.rs#14-17.
  3. Attachment Recall: If the user provides specific artifact IDs or queries for previous attachments, the pipeline recalls relevant media chunks and metadata crates/palyra-daemon/src/application/provider_input.rs#99-106.
  4. Auto-Inject (RAG): The system performs a vector search against the memory store based on the current input text. Highly relevant snippets (based on MEMORY_AUTO_INJECT_MIN_SCORE) are automatically injected into the prompt crates/palyra-daemon/src/application/provider_input.rs#190-200.
  5. Explicit Recall: Processes specific search queries or item IDs requested via the parameter_delta JSON to pull targeted information into the context crates/palyra-daemon/src/application/provider_input.rs#67-87.
  6. Vision Input Preparation: If attachments include images, they are validated against MediaRuntimeConfig and converted into ProviderImageInput structures crates/palyra-daemon/src/application/provider_input.rs#108-112.

Context Reference Resolution

The pipeline also resolves “Context References” (e.g., @file, @url, @memory). These are parsed from the input text and fetched in real-time.
Reference KindResolution LogicLimit
File / FolderReads from authorized workspace roots crates/palyra-daemon/src/application/context_references.rs#188-1938,000 chars/file
UrlFetches via http_fetch tool with content-type validation crates/palyra-daemon/src/application/context_references.rs#200-2038,000 chars
MemoryPerforms a targeted search in the Journal crates/palyra-daemon/src/application/context_references.rs#204-2064 items
Git / DiffExecutes local git commands to retrieve changes crates/palyra-daemon/src/application/context_references.rs#194-19910,000 chars
Sources: crates/palyra-daemon/src/application/provider_input.rs#1-210, crates/palyra-daemon/src/application/context_references.rs#27-61, crates/palyra-daemon/src/application/context_references.rs#178-206

Run Stream Orchestration

The run_stream pipeline manages the lifecycle of a single “Run”—from the initial user request to the final streaming response. It is implemented as a state machine (RunStateMachine) that coordinates between the user, the model, and the tool execution environment.

Run Lifecycle Flow

The orchestration logic in crates/palyra-daemon/src/application/run_stream/orchestration.rs follows this sequence:
  1. Initialization: The run is registered in the journal and transitions to Accepted crates/palyra-daemon/src/application/run_stream/orchestration.rs#183-189.
  2. Input Preparation: Calls prepare_model_provider_input to generate the augmented prompt crates/palyra-daemon/src/application/run_stream/orchestration.rs#15-17.
  3. Smart Routing: Evaluates the SmartRoutingRuntimeConfig to select the optimal model based on prompt complexity and provider health crates/palyra-daemon/src/usage_governance.rs#50-54, crates/palyra-daemon/src/application/run_stream/orchestration.rs#30-31.
  4. Provider Execution: Sends the request to the ModelProvider. This is a long-running future that is polled while checking for cancellation crates/palyra-daemon/src/application/run_stream/orchestration.rs#152-160.
  5. Event Processing: As the provider streams chunks, process_run_stream_provider_events handles text generation and detects ToolCall requests crates/palyra-daemon/src/application/provider_events.rs#12-14.

Data Flow: Natural Language to Code Entity Space

This diagram maps how user-facing concepts are represented by specific code structures during a Run. Title: Run Pipeline Entity Mapping Sources: crates/palyra-daemon/src/application/run_stream/orchestration.rs#15-31, crates/palyra-daemon/src/orchestrator.rs#28-29, crates/palyra-daemon/src/application/provider_input.rs#41-64

Tool Flow and Execution

When the LLM emits a tool_call, the pipeline enters the tool_flow.
  1. Tool Discovery: The system identifies the requested tool from the ToolInventory.
  2. Approval Check: Depending on policy, the run may pause for manual operator approval. The RunStateMachine transitions to PendingApproval crates/palyra-daemon/src/orchestrator.rs#28-29.
  3. Execution: The tool is executed in its designated sandbox (Tier A, B, or C).
  4. Feedback Loop: The tool’s output is appended to the OrchestratorTapeRecord and sent back to the LLM for a follow-up response crates/palyra-daemon/src/application/run_stream/tool_flow.rs.

Cancellation Logic

Cancellation can be triggered by the user or the system. The orchestration loop polls is_orchestrator_cancel_requested at 100ms intervals crates/palyra-daemon/src/application/run_stream/orchestration.rs#153-162. If a cancellation is detected: Title: Run Stream Logic Flow Sources: crates/palyra-daemon/src/application/run_stream/orchestration.rs#144-180, crates/palyra-daemon/src/transport/http/handlers/console/chat.rs#139-144, crates/palyra-daemon/src/application/run_stream/cancellation.rs#33-34

Smart Routing and Usage Governance

The pipeline incorporates “Smart Routing” to manage costs and performance. Before calling the provider, the system evaluates: The plan_usage_routing function can override the requested model with a more efficient one if the complexity is low, or block the request if it exceeds hard limits crates/palyra-daemon/src/usage_governance.rs#201-213. Sources: crates/palyra-daemon/src/usage_governance.rs#1-226, crates/palyra-daemon/src/application/run_stream/orchestration.rs#30-31