Core Abstractions: ModelProvider and EmbeddingsProvider
The system is built around two primary traits defined incrates/palyra-daemon/src/model_provider.rs. These traits decouple the agent’s request for intelligence from the specific transport or provider implementation.
ModelProvider: Handles chat completions, tool proposals, and audio transcriptions crates/palyra-daemon/src/model_provider.rs#5-9.EmbeddingsProvider: Handles vector generation for RAG and semantic search crates/palyra-daemon/src/model_provider.rs#6-6.
RegistryBackedModelProvider is the concrete implementation that manages a collection of these providers, performing candidate selection based on the current configuration crates/palyra-daemon/src/model_provider.rs#10-12.
Natural Language Space to Code Entity Space: Provider Contracts
The following diagram maps high-level provider concepts to the specific Rust entities that implement them. Title: Provider Contract Mapping Sources: crates/palyra-daemon/src/model_provider.rs#5-18, crates/palyra-model-providers/src/lib.rs#89-97Registry and Routing Logic
TheRegistryBackedModelProvider acts as an orchestrator. When a request is made, it evaluates the ModelProviderRegistryConfig to identify viable candidates.
Candidate Selection and Ranking
Candidates are filtered based on:- Capability: Does the model support the requested feature (e.g., vision, tool use, or reasoning effort)? crates/palyra-daemon/src/model_provider.rs#54-55.
- Health: Is the provider currently circuit-broken? crates/palyra-daemon/src/model_provider.rs#91-96.
- Priority: Defined by the
ProviderRegistryEntryConfigin the configuration crates/palyra-daemon/src/model_provider.rs#60-61.
Circuit Breaking and Failover
The system implements a fail-closed classification strategy. If a primary candidate fails, theProviderFailureClassification determines the next step crates/palyra-daemon/src/model_provider.rs#99-101:
Retry: For transient errors like HTTP 429 or 503 crates/palyra-daemon/src/model_provider.rs#120-120.Failover: Switch to the next ranked provider in the registry crates/palyra-daemon/src/model_provider.rs#64-64.FailClosed: Terminate the request if the error is unrecoverable (e.g., authentication failure) crates/palyra-daemon/src/model_provider.rs#64-64.
Response Caching and Normalization
To optimize latency and cost, Palyra employs TTL-bounded response caching.- Cache Strategy: Controlled by
PromptCachePolicy. It can be set toStreamingorResponse-bodybased crates/palyra-daemon/src/model_provider.rs#81-83. - Normalization: Regardless of the upstream provider’s format, the output is normalized into
ProviderEvent::ModelTokenevents. This allows theOrchestratorto remain provider-agnostic crates/palyra-daemon/src/model_provider.rs#14-18. - Tool Repair: If a model produces malformed tool arguments, the
ToolRepairStreamNormalizerattempts to fix the JSON before it reaches the execution layer crates/palyra-daemon/src/model_provider.rs#71-77.
Configuration and Secrets
The provider configuration is loaded through a multi-layered pipeline (Defaults -> TOML -> Env Vars) crates/palyra-daemon/src/config/load.rs#4-9.| Feature | Description | Code Reference |
|---|---|---|
| API Keys | Loaded via SecretRef to prevent accidental logging. | crates/palyra-common/src/daemon_config_schema.rs#27-32 |
| Network Policy | Validates base URLs to prevent SSRF or unauthorized egress. | crates/palyra-daemon/src/model_provider.rs#56-56 |
| Service Tiers | Configures ProviderServiceTier (e.g., scale vs. standard). | crates/palyra-daemon/src/model_provider.rs#87-87 |
Code Entity Space: Configuration Structure
Title: Model Provider Configuration Schema Sources: crates/palyra-daemon/src/config/load.rs#109-110, crates/palyra-daemon/src/model_provider.rs#57-61Implementation Details
Key Functions and Modules
load_config(): Resolves provider credentials and network policies frompalyra.tomlor environment variables crates/palyra-daemon/src/config/load.rs#91-109.process_route_provider_response(): Post-processes raw provider events, handles tool execution inline, and persists results to theOrchestratorTapecrates/palyra-daemon/src/application/route_message/response.rs#163-174.adaptersmodule: Contains provider-specific logic for translating Palyra’s internalProviderRequestinto wire-format JSON for OpenAI or Anthropic crates/palyra-daemon/src/model_provider/adapters.rs#16-19.
Constraints
- Embeddings Batch Size: Limited to 64 to ensure provider stability crates/palyra-daemon/src/model_provider.rs#121-121.
- Text Limits: Outbound messages are chunked to
DEFAULT_ROUTE_MESSAGE_OUTPUT_MAX_BYTES(2,000 bytes) to accommodate connector limits crates/palyra-daemon/src/application/route_message/response.rs#35-35.