Skip to main content
The Model Provider subsystem is the gateway through which Palyra interacts with Large Language Models (LLMs). It abstracts various upstream APIs (OpenAI, Anthropic, Google, MiniMax) into a unified interface, providing resilient routing, circuit breaking, and automated failover.

Core Abstractions: ModelProvider and EmbeddingsProvider

The system is built around two primary traits defined in crates/palyra-daemon/src/model_provider.rs. These traits decouple the agent’s request for intelligence from the specific transport or provider implementation. The RegistryBackedModelProvider is the concrete implementation that manages a collection of these providers, performing candidate selection based on the current configuration crates/palyra-daemon/src/model_provider.rs#10-12.

Natural Language Space to Code Entity Space: Provider Contracts

The following diagram maps high-level provider concepts to the specific Rust entities that implement them. Title: Provider Contract Mapping Sources: crates/palyra-daemon/src/model_provider.rs#5-18, crates/palyra-model-providers/src/lib.rs#89-97

Registry and Routing Logic

The RegistryBackedModelProvider acts as an orchestrator. When a request is made, it evaluates the ModelProviderRegistryConfig to identify viable candidates.

Candidate Selection and Ranking

Candidates are filtered based on:
  1. Capability: Does the model support the requested feature (e.g., vision, tool use, or reasoning effort)? crates/palyra-daemon/src/model_provider.rs#54-55.
  2. Health: Is the provider currently circuit-broken? crates/palyra-daemon/src/model_provider.rs#91-96.
  3. Priority: Defined by the ProviderRegistryEntryConfig in the configuration crates/palyra-daemon/src/model_provider.rs#60-61.

Circuit Breaking and Failover

The system implements a fail-closed classification strategy. If a primary candidate fails, the ProviderFailureClassification determines the next step crates/palyra-daemon/src/model_provider.rs#99-101: Title: Request Routing and Failover Flow Sources: crates/palyra-daemon/src/model_provider.rs#10-18, crates/palyra-daemon/src/model_provider.rs#62-67, crates/palyra-daemon/src/model_provider.rs#120-120

Response Caching and Normalization

To optimize latency and cost, Palyra employs TTL-bounded response caching.

Configuration and Secrets

The provider configuration is loaded through a multi-layered pipeline (Defaults -> TOML -> Env Vars) crates/palyra-daemon/src/config/load.rs#4-9.
FeatureDescriptionCode Reference
API KeysLoaded via SecretRef to prevent accidental logging.crates/palyra-common/src/daemon_config_schema.rs#27-32
Network PolicyValidates base URLs to prevent SSRF or unauthorized egress.crates/palyra-daemon/src/model_provider.rs#56-56
Service TiersConfigures ProviderServiceTier (e.g., scale vs. standard).crates/palyra-daemon/src/model_provider.rs#87-87

Code Entity Space: Configuration Structure

Title: Model Provider Configuration Schema Sources: crates/palyra-daemon/src/config/load.rs#109-110, crates/palyra-daemon/src/model_provider.rs#57-61

Implementation Details

Key Functions and Modules

Constraints

Sources: crates/palyra-daemon/src/model_provider.rs, crates/palyra-daemon/src/config/load.rs, crates/palyra-daemon/src/application/route_message/response.rs, crates/palyra-model-providers/src/lib.rs