Model Provider Integration
The daemon supports multiple provider architectures through a unified interface, primarily targeting OpenAI-compatible APIs and Anthropic’s Messages API.Provider Registry & Configuration
TheModelProviderRegistryConfig manages the list of available providers and their associated models. Each provider is defined by a ModelProviderKind (Deterministic, OpenAiCompatible, or Anthropic) crates/palyra-daemon/src/model_provider.rs#37-43.
Providers are configured with:
- Base URL: The API endpoint (e.g.,
https://api.openai.com/v1) crates/palyra-daemon/src/model_provider.rs#148. - Authentication: Managed via
auth_profile_idor directapi_key(often stored as aVaultRef) crates/palyra-daemon/src/model_provider.rs#151-154. - Resilience: Settings for
request_timeout_ms,max_retries, and circuit breaker thresholds crates/palyra-daemon/src/model_provider.rs#156-160.
Authentication Profiles
Authentication is abstracted through theAuthProfileRegistry. This allows the daemon to refresh credentials (e.g., OAuth2 tokens) independently of the model request logic. When a provider request is initiated, the system resolves the AuthCredential crates/palyra-cli/src/commands/models.rs#189-192.
Request Transformation
The daemon transforms internalProviderRequest structures into provider-specific payloads:
- OpenAI: Constructs JSON for
/chat/completions, including vision inputs as base64 data strings crates/palyra-daemon/src/model_provider.rs#211-216. - Anthropic: Maps messages to the
anthropicformat, specifying theanthropic-versionheader (e.g.,2023-06-01) crates/palyra-daemon/src/model_provider.rs#22-23.
Provider Input Preparation
Before a message is sent to a model, theprepare_model_provider_input function orchestrates the construction of the final prompt crates/palyra-daemon/src/gateway/tests.rs#50-54.
Memory Augmentation (RAG)
If auto-injection is enabled, the daemon performs a vector search viasearch_memory using the user’s input as a query crates/palyra-daemon/src/application/provider_input.rs#190-192.
- Scoring: Results must meet a minimum similarity threshold (default
MEMORY_AUTO_INJECT_MIN_SCORE) crates/palyra-daemon/src/application/provider_input.rs#22. - Formatting: Hits are rendered into the prompt using
render_memory_augmented_promptcrates/palyra-daemon/src/application/provider_input.rs#53.
Session Compaction
To manage long-running conversations within context limits, the daemon employsapply_session_compaction. This process:
- Identifies Candidates: Scans the
OrchestratorTapefor facts, decisions, and summaries crates/palyra-daemon/src/application/session_compaction.rs#158-169. - Summarization: Condenses older turns into a
summary_textblock crates/palyra-daemon/src/application/session_compaction.rs#82. - Checkpointing: Creates an
OrchestratorCheckpointRecordto allow the session to “reset” while retaining critical state crates/palyra-daemon/src/application/session_compaction.rs#21-27.
Input Preparation Flow
The following diagram illustrates the transformation from a raw user message to a provider-ready payload. Prompt Assembly Pipeline Sources: crates/palyra-daemon/src/application/provider_input.rs#41-107, crates/palyra-daemon/src/application/session_compaction.rs#31-147, crates/palyra-daemon/src/gateway/tests.rs#50-54.Orchestration & Routines
Thecron module handles scheduled execution of agent tasks, known as Routines.
Scheduler Loop
The scheduler runs as a background service, polling for due jobs every 15 seconds (SCHEDULER_IDLE_SLEEP) crates/palyra-daemon/src/cron.rs#42.
- Cron Expressions: Supports standard 5-field cron syntax (minute, hour, day, month, weekday) crates/palyra-daemon/src/cron.rs#140-146.
- Timezones: Can operate in
UtcorLocalsystem time crates/palyra-daemon/src/cron.rs#60-65.
Routine Lifecycle
When a routine is triggered:- Dispatch: A
CronRunStartRequestis issued to the journal crates/palyra-daemon/src/cron.rs#35. - Execution: The orchestrator initializes a
Runusing the routine’s prompt and owner principal crates/palyra-daemon/src/transport/http/handlers/console/routines.rs#56-60. - Finalization: Upon completion, a
CronRunFinalizeRequestupdates the status toSuccessorFailurecrates/palyra-daemon/src/cron.rs#35.
Routine Orchestration Architecture
Sources: crates/palyra-daemon/src/cron.rs#1-156, crates/palyra-daemon/src/transport/http/handlers/console/routines.rs#9-107.Usage Governance & Smart Routing
Theusage_governance module monitors token consumption and provides “Smart Routing” to optimize for cost or latency.
Budget Evaluation
The system evaluatesUsageBudgetPolicyRecord entries against current consumption crates/palyra-daemon/src/usage_governance.rs#11-12.
- Metrics: Tracks
prompt_tokens,completion_tokens, and estimated USD cost crates/palyra-daemon/src/usage_governance.rs#169-177. - Actions: Can
Suggesta cheaper model,DryRun(log only), orEnforced(block the request) crates/palyra-daemon/src/usage_governance.rs#28-32.
Routing Decisions
TheRoutingDecision logic selects the best model based on:
- Complexity Score: Estimated based on prompt length and task type crates/palyra-daemon/src/usage_governance.rs#121.
- Provider Health: Current latency and error rates crates/palyra-daemon/src/usage_governance.rs#122.
- Cost Tier:
Low,Standard, orPremiumcrates/palyra-daemon/src/model_provider.rs#107-111.