The prepare_model_provider_input Pipeline
Before a request is sent to a Model Provider, the daemon executes a multi-stage augmentation pipeline. This is primarily handled by the prepare_model_provider_input function, which coordinates memory ingestion, session history compaction, and context retrieval.
Pipeline Stages
- Memory Ingest: Incoming user messages are ingested into the
JournalStoreas memory items for future retrieval crates/palyra-daemon/src/application/provider_input.rs#168-171. - Session Compaction: If the session history (tape) grows too large, the pipeline triggers a compaction strategy. It builds a plan to summarize older parts of the conversation to stay within model context limits crates/palyra-daemon/src/application/provider_input.rs#14-17.
- Attachment Recall: If the user provides specific artifact IDs or queries for previous attachments, the pipeline recalls relevant media chunks and metadata crates/palyra-daemon/src/application/provider_input.rs#99-106.
- Auto-Inject (RAG): The system performs a vector search against the memory store based on the current input text. Highly relevant snippets (based on
MEMORY_AUTO_INJECT_MIN_SCORE) are automatically injected into the prompt crates/palyra-daemon/src/application/provider_input.rs#190-200. - Explicit Recall: Processes specific search queries or item IDs requested via the
parameter_deltaJSON to pull targeted information into the context crates/palyra-daemon/src/application/provider_input.rs#67-87. - Vision Input Preparation: If attachments include images, they are validated against
MediaRuntimeConfigand converted intoProviderImageInputstructures crates/palyra-daemon/src/application/provider_input.rs#108-112.
Context Reference Resolution
The pipeline also resolves “Context References” (e.g.,@file, @url, @memory). These are parsed from the input text and fetched in real-time.
| Reference Kind | Resolution Logic | Limit |
|---|---|---|
File / Folder | Reads from authorized workspace roots crates/palyra-daemon/src/application/context_references.rs#188-193 | 8,000 chars/file |
Url | Fetches via http_fetch tool with content-type validation crates/palyra-daemon/src/application/context_references.rs#200-203 | 8,000 chars |
Memory | Performs a targeted search in the Journal crates/palyra-daemon/src/application/context_references.rs#204-206 | 4 items |
Git / Diff | Executes local git commands to retrieve changes crates/palyra-daemon/src/application/context_references.rs#194-199 | 10,000 chars |
Run Stream Orchestration
Therun_stream pipeline manages the lifecycle of a single “Run”—from the initial user request to the final streaming response. It is implemented as a state machine (RunStateMachine) that coordinates between the user, the model, and the tool execution environment.
Run Lifecycle Flow
The orchestration logic incrates/palyra-daemon/src/application/run_stream/orchestration.rs follows this sequence:
- Initialization: The run is registered in the journal and transitions to
Acceptedcrates/palyra-daemon/src/application/run_stream/orchestration.rs#183-189. - Input Preparation: Calls
prepare_model_provider_inputto generate the augmented prompt crates/palyra-daemon/src/application/run_stream/orchestration.rs#15-17. - Smart Routing: Evaluates the
SmartRoutingRuntimeConfigto select the optimal model based on prompt complexity and provider health crates/palyra-daemon/src/usage_governance.rs#50-54, crates/palyra-daemon/src/application/run_stream/orchestration.rs#30-31. - Provider Execution: Sends the request to the
ModelProvider. This is a long-running future that is polled while checking for cancellation crates/palyra-daemon/src/application/run_stream/orchestration.rs#152-160. - Event Processing: As the provider streams chunks,
process_run_stream_provider_eventshandles text generation and detectsToolCallrequests crates/palyra-daemon/src/application/provider_events.rs#12-14.
Data Flow: Natural Language to Code Entity Space
This diagram maps how user-facing concepts are represented by specific code structures during a Run. Title: Run Pipeline Entity Mapping Sources: crates/palyra-daemon/src/application/run_stream/orchestration.rs#15-31, crates/palyra-daemon/src/orchestrator.rs#28-29, crates/palyra-daemon/src/application/provider_input.rs#41-64Tool Flow and Execution
When the LLM emits atool_call, the pipeline enters the tool_flow.
- Tool Discovery: The system identifies the requested tool from the
ToolInventory. - Approval Check: Depending on policy, the run may pause for manual operator approval. The
RunStateMachinetransitions toPendingApprovalcrates/palyra-daemon/src/orchestrator.rs#28-29. - Execution: The tool is executed in its designated sandbox (Tier A, B, or C).
- Feedback Loop: The tool’s output is appended to the
OrchestratorTapeRecordand sent back to the LLM for a follow-up response crates/palyra-daemon/src/application/run_stream/tool_flow.rs.
Cancellation Logic
Cancellation can be triggered by the user or the system. The orchestration loop pollsis_orchestrator_cancel_requested at 100ms intervals crates/palyra-daemon/src/application/run_stream/orchestration.rs#153-162. If a cancellation is detected:
- The provider request is dropped.
transition_run_stream_to_cancelledis called to update the state toCancelledand notify the client crates/palyra-daemon/src/application/run_stream/orchestration.rs#112-116.
Smart Routing and Usage Governance
The pipeline incorporates “Smart Routing” to manage costs and performance. Before calling the provider, the system evaluates:- Complexity Score: Estimated based on prompt tokens and intent crates/palyra-daemon/src/usage_governance.rs#145-151.
- Budget Policies: Checks if the principal has exceeded their
UsageBudgetPolicyRecordcrates/palyra-daemon/src/usage_governance.rs#108-126. - Pricing Estimates: Calculates the projected cost using
UsagePricingRecordcrates/palyra-daemon/src/usage_governance.rs#64-73.
plan_usage_routing function can override the requested model with a more efficient one if the complexity is low, or block the request if it exceeds hard limits crates/palyra-daemon/src/usage_governance.rs#201-213.
Sources: crates/palyra-daemon/src/usage_governance.rs#1-226, crates/palyra-daemon/src/application/run_stream/orchestration.rs#30-31