Model Providers & Registry
Themodel_provider.rs module manages the integration with external LLM providers (OpenAI, Anthropic, and OpenAI-compatible endpoints). It handles request normalization, credential resolution, and reliability patterns like circuit breaking and failover.
Provider Architecture
Palyra uses a registry-based approach where multiple providers and models can be configured simultaneously. The system supports three primaryModelProviderKind: Deterministic, OpenAiCompatible, and Anthropic crates/palyra-daemon/src/model_provider.rs#37-43.
Models are categorized by ProviderModelRole:
- Chat: Standard completion and tool-calling interfaces crates/palyra-daemon/src/model_provider.rs#77.
- Embeddings: Vector generation for RAG and memory crates/palyra-daemon/src/model_provider.rs#78.
- AudioTranscription: Converting speech to text crates/palyra-daemon/src/model_provider.rs#79.
Reliability & Performance
The registry implements several patterns to ensure high availability:- Circuit Breaker: Tracks failures per provider and enters a cooldown state if a threshold is met crates/palyra-daemon/src/model_provider.rs#159-160.
- Failover: If enabled, the system can automatically route requests to alternative models if the primary fails crates/palyra-daemon/src/model_provider.rs#181.
- Response Caching: Optional TTL-based caching for model responses to reduce costs and latency crates/palyra-daemon/src/model_provider.rs#182-184.
Model Selection Logic
The following diagram illustrates how the system selects a model and provider for a specific request. Model Selection & Execution Flow Sources: crates/palyra-daemon/src/model_provider.rs#144-187, crates/palyra-daemon/src/model_provider.rs#11-14Cron & Scheduled Runs
Thecron.rs module implements a high-precision scheduler for automated tasks, including recurring agent runs, memory maintenance, and system health checks.
Scheduler Implementation
The scheduler runs as a background loop, waking up periodically to check for due jobs crates/palyra-daemon/src/cron.rs#42. It supports three schedule types:- Cron: Standard 5-field cron expressions crates/palyra-daemon/src/cron.rs#123.
- Every: Fixed interval-based execution (e.g., every 5 minutes) crates/palyra-daemon/src/cron.rs#124.
- At: One-time execution at a specific timestamp crates/palyra-daemon/src/cron.rs#125.
System Maintenance Tasks
The daemon registers several internal cron jobs for self-management:- Memory Maintenance: Runs every 5 minutes to prune or optimize the journal crates/palyra-daemon/src/cron.rs#56.
- Embeddings Backfill: Runs every 10 minutes to ensure all journal entries have vector representations crates/palyra-daemon/src/cron.rs#57.
- Skill Re-audit: Periodically verifies the integrity of installed WASM skills crates/palyra-daemon/src/cron.rs#54.
Usage Governance & Budgeting
Theusage_governance.rs module provides a policy engine for tracking token consumption and enforcing financial limits across different scopes (e.g., per user, per session, or global).
Budget Policies
Administrators can defineUsageBudgetPolicyRecord entries that specify:
- Metric Kind:
tokensorusd_costcrates/palyra-daemon/src/usage_governance.rs#96. - Interval:
daily,weekly, ormonthlycrates/palyra-daemon/src/usage_governance.rs#97. - Limits: Both
soft_limit(alerts only) andhard_limit(blocks execution) crates/palyra-daemon/src/usage_governance.rs#105-107.
Smart Routing
The system can dynamically select models based on the complexity of the prompt and the current budget status viaRoutingMode:
- Suggest: Recommends a model but doesn’t enforce it crates/palyra-daemon/src/usage_governance.rs#29.
- DryRun: Logs what would have happened under enforcement crates/palyra-daemon/src/usage_governance.rs#30.
- Enforced: Actively overrides model selection to stay within budget or optimize for cost/latency crates/palyra-daemon/src/usage_governance.rs#31.
Data Structures
| Struct | Purpose |
|---|---|
PricingEstimate | Calculates projected cost for a run based on input/output tokens crates/palyra-daemon/src/usage_governance.rs#69-77. |
UsageBudgetEvaluation | The result of checking a run against active policies crates/palyra-daemon/src/usage_governance.rs#92-109. |
RoutingDecision | Final determination of which model to use and why crates/palyra-daemon/src/usage_governance.rs#112-130. |
Configuration Loading
Configuration is loaded viaconfig/load.rs, which merges defaults with the palyra.toml file and environment variables.
Loading Sequence
- Search: Looks for
palyra.tomlin standard paths crates/palyra-daemon/src/config/load.rs#50. - Parse & Migrate: Parses TOML and applies version migrations if the file is from an older schema version crates/palyra-daemon/src/config/load.rs#53-58.
- Credential Resolution: Resolves
api_key_vault_refinto actual keys using theVaultsystem crates/palyra-daemon/src/model_provider.rs#154. - Runtime State: The
LoadedConfigis used to build theAppStatecrates/palyra-daemon/src/app/runtime.rs#44-48.
Model CLI Integration
Thepalyra CLI provides commands to inspect and modify these configurations:
models status: Shows current provider health and configured defaults crates/palyra-cli/src/commands/models.rs#196-199.models list: Catalogs all available models from all registered providers crates/palyra-cli/src/commands/models.rs#200-203.models explain: Provides a trace of the model selection logic for a specific prompt crates/palyra-cli/src/commands/models.rs#152-162.