Provider Registry and Routing

The Model Provider subsystem is the gateway through which Palyra interacts with Large Language Models (LLMs). It abstracts various upstream APIs (OpenAI, Anthropic, Google, MiniMax) into a unified interface, providing resilient routing, circuit breaking, and automated failover.

Core Abstractions: ModelProvider and EmbeddingsProvider

The system is built around two primary traits defined in crates/palyra-daemon/src/model_provider.rs. These traits decouple the agent’s request for intelligence from the specific transport or provider implementation.

ModelProvider: Handles chat completions, tool proposals, and audio transcriptions crates/palyra-daemon/src/model_provider.rs#5-9.
EmbeddingsProvider: Handles vector generation for RAG and semantic search crates/palyra-daemon/src/model_provider.rs#6-6.

The RegistryBackedModelProvider is the concrete implementation that manages a collection of these providers, performing candidate selection based on the current configuration crates/palyra-daemon/src/model_provider.rs#10-12.

Natural Language Space to Code Entity Space: Provider Contracts

The following diagram maps high-level provider concepts to the specific Rust entities that implement them. Title: Provider Contract Mapping Sources: crates/palyra-daemon/src/model_provider.rs#5-18, crates/palyra-model-providers/src/lib.rs#89-97

Registry and Routing Logic

The RegistryBackedModelProvider acts as an orchestrator. When a request is made, it evaluates the ModelProviderRegistryConfig to identify viable candidates.

Candidate Selection and Ranking

Candidates are filtered based on:

Capability: Does the model support the requested feature (e.g., vision, tool use, or reasoning effort)? crates/palyra-daemon/src/model_provider.rs#54-55.
Health: Is the provider currently circuit-broken? crates/palyra-daemon/src/model_provider.rs#91-96.
Priority: Defined by the ProviderRegistryEntryConfig in the configuration crates/palyra-daemon/src/model_provider.rs#60-61.

Circuit Breaking and Failover

The system implements a fail-closed classification strategy. If a primary candidate fails, the ProviderFailureClassification determines the next step crates/palyra-daemon/src/model_provider.rs#99-101:

Retry: For transient errors like HTTP 429 or 503 crates/palyra-daemon/src/model_provider.rs#120-120.
Failover: Switch to the next ranked provider in the registry crates/palyra-daemon/src/model_provider.rs#64-64.
FailClosed: Terminate the request if the error is unrecoverable (e.g., authentication failure) crates/palyra-daemon/src/model_provider.rs#64-64.

Title: Request Routing and Failover Flow Sources: crates/palyra-daemon/src/model_provider.rs#10-18, crates/palyra-daemon/src/model_provider.rs#62-67, crates/palyra-daemon/src/model_provider.rs#120-120

Response Caching and Normalization

To optimize latency and cost, Palyra employs TTL-bounded response caching.

Cache Strategy: Controlled by PromptCachePolicy. It can be set to Streaming or Response-body based crates/palyra-daemon/src/model_provider.rs#81-83.
Normalization: Regardless of the upstream provider’s format, the output is normalized into ProviderEvent::ModelToken events. This allows the Orchestrator to remain provider-agnostic crates/palyra-daemon/src/model_provider.rs#14-18.
Tool Repair: If a model produces malformed tool arguments, the ToolRepairStreamNormalizer attempts to fix the JSON before it reaches the execution layer crates/palyra-daemon/src/model_provider.rs#71-77.

Configuration and Secrets

The provider configuration is loaded through a multi-layered pipeline (Defaults -> TOML -> Env Vars) crates/palyra-daemon/src/config/load.rs#4-9.

Feature	Description	Code Reference
API Keys	Loaded via `SecretRef` to prevent accidental logging.	crates/palyra-common/src/daemon_config_schema.rs#27-32
Network Policy	Validates base URLs to prevent SSRF or unauthorized egress.	crates/palyra-daemon/src/model_provider.rs#56-56
Service Tiers	Configures `ProviderServiceTier` (e.g., scale vs. standard).	crates/palyra-daemon/src/model_provider.rs#87-87

Code Entity Space: Configuration Structure

Title: Model Provider Configuration Schema Sources: crates/palyra-daemon/src/config/load.rs#109-110, crates/palyra-daemon/src/model_provider.rs#57-61

Implementation Details

Key Functions and Modules

load_config(): Resolves provider credentials and network policies from palyra.toml or environment variables crates/palyra-daemon/src/config/load.rs#91-109.
process_route_provider_response(): Post-processes raw provider events, handles tool execution inline, and persists results to the OrchestratorTape crates/palyra-daemon/src/application/route_message/response.rs#163-174.
adapters module: Contains provider-specific logic for translating Palyra’s internal ProviderRequest into wire-format JSON for OpenAI or Anthropic crates/palyra-daemon/src/model_provider/adapters.rs#16-19.

Constraints

Embeddings Batch Size: Limited to 64 to ensure provider stability crates/palyra-daemon/src/model_provider.rs#121-121.
Text Limits: Outbound messages are chunked to DEFAULT_ROUTE_MESSAGE_OUTPUT_MAX_BYTES (2,000 bytes) to accommodate connector limits crates/palyra-daemon/src/application/route_message/response.rs#35-35.

Sources: crates/palyra-daemon/src/model_provider.rs, crates/palyra-daemon/src/config/load.rs, crates/palyra-daemon/src/application/route_message/response.rs, crates/palyra-model-providers/src/lib.rs

​Core Abstractions: ModelProvider and EmbeddingsProvider

​Natural Language Space to Code Entity Space: Provider Contracts

​Registry and Routing Logic

​Candidate Selection and Ranking

​Circuit Breaking and Failover

​Response Caching and Normalization

​Configuration and Secrets

​Code Entity Space: Configuration Structure

​Implementation Details

​Key Functions and Modules

​Constraints