Model Provider Configuration
Palyra supports multiple model providers through a unified registry. The system abstracts provider-specific APIs (OpenAI, Anthropic) into a common internal representation for chat completions, embeddings, and audio transcriptions.Supported Provider Kinds
The daemon recognizes three primary provider types:Deterministic: Local or mock providers used for testing or specific non-LLM logic crates/palyra-daemon/src/model_provider.rs#40-40.OpenAiCompatible: Standard OpenAI API surface, also used for compatible services like Groq, Together, or local Ollama instances crates/palyra-daemon/src/model_provider.rs#41-41.Anthropic: Native support for the Anthropic Messages API crates/palyra-daemon/src/model_provider.rs#42-42.
Registry and Model Metadata
TheModelProviderRegistryConfig manages a collection of ProviderRegistryEntryConfig (the “where” and “how” of a connection) and ProviderModelEntryConfig (the specific models available) crates/palyra-daemon/src/model_provider.rs#144-187.
| Entity | Purpose | Key Fields |
|---|---|---|
| Provider Entry | Connection settings | base_url, auth_profile_id, max_retries, circuit_breaker |
| Model Entry | Model capabilities | role (Chat/Embed), capabilities (Vision/Tools), metadata_source |
Auth Profile Registry
The Auth Profile system separates identity and credentials from model configuration. This allows multiple agents or components to share a single set of credentials or rotate them without modifying the provider registry.Credential Types and Storage
Credentials are never stored in plain text in the configuration files. Instead, they are persisted in thepalyra-vault and referenced by a VaultRef crates/palyra-daemon/src/openai_surface.rs#42-48.
- API Key: A static bearer token stored in the vault crates/palyra-auth/src/lib.rs#10-16.
- OAuth2: Managed profiles that support the
openid,profile, andoffline_accessscopes, allowing for automatic token refresh apps/web/src/console/hooks/useAuthDomain.ts#15-16.
Profile Scoping
Profiles can be scoped to limit where credentials can be used:- Global: Available to any run or agent apps/web/src/console/hooks/useAuthDomain.ts#25-25.
- Agent: Restricted to a specific
agent_idapps/web/src/console/hooks/useAuthDomain.ts#25-25.
Auth Data Flow (API Key Connection)
The following diagram illustrates the flow when a user connects a new OpenAI API key via the Web Console. Title: OpenAI API Key Connection Flow Sources: apps/web/src/console/sections/AuthSection.tsx#130-166, crates/palyra-daemon/src/openai_surface.rs#18-78, crates/palyra-daemon/tests/openai_auth_surface.rs#29-73Usage Governance and Smart Routing
The governance subsystem tracks token consumption and applies “Smart Routing” logic to select the most cost-effective or performant model based on the complexity of the prompt.Smart Routing Logic
TheRoutingDecision is calculated by evaluating the complexity_score of a prompt against available model capabilities and health_state crates/palyra-daemon/src/usage_governance.rs#112-130.
- Suggest Mode: Recommends a model but uses the default crates/palyra-daemon/src/usage_governance.rs#29-29.
- Dry Run: Logs what would have happened crates/palyra-daemon/src/usage_governance.rs#30-30.
- Enforced Mode: Overrides the requested model with the recommended one crates/palyra-daemon/src/usage_governance.rs#31-31.
Usage Budgets
The system enforcesUsageBudgetPolicyRecord rules to prevent runaway costs crates/palyra-daemon/src/usage_governance.rs#11-13.
| Metric | Enforcement Actions |
|---|---|
| Consumed Value (USD) | Soft Limit (Alert), Hard Limit (Block) |
| Token Count | Per Session, Per Agent, or Global |
Cost Tracking Implementation
TheUsageEnrichedRun structure joins OrchestratorUsageInsightsRunRecord with UsagePricingRecord to provide real-time USD estimates crates/palyra-daemon/src/usage_governance.rs#208-214.
Title: Governance and Routing Architecture
Sources: crates/palyra-daemon/src/usage_governance.rs#26-146, crates/palyra-daemon/src/journal.rs#8-13
OpenAI-Compatible Surface
The daemon exposes an OpenAI-compatible HTTP surface at/v1/*, allowing standard OpenAI SDKs and tools to interact with Palyra as if it were the upstream provider.
Key Handlers
/v1/chat/completions: Proxies requests to the internal orchestrator, applying Palyra’s routing and budget policies crates/palyra-daemon/src/model_provider.rs#19-19./v1/embeddings: Routes vectorization requests to the configured embedding provider crates/palyra-daemon/src/model_provider.rs#20-20./v1/audio/transcriptions: Handles Whisper-style audio processing crates/palyra-daemon/src/model_provider.rs#21-21.
Request Transformation
When a request hits the OpenAI-compatible surface, theModelProvider translates the generic ProviderRequest into the specific format required by the target (e.g., converting OpenAI JSON to Anthropic XML/JSON structures) crates/palyra-daemon/src/model_provider.rs#211-216.
Sources: crates/palyra-daemon/src/model_provider.rs#19-35, crates/palyra-daemon/src/openai_surface.rs#15-15
Maintenance and Background Tasks
Thecron subsystem handles recurring tasks related to model health and credential maintenance.
- Memory Maintenance: Runs every 5 minutes to prune or optimize local context crates/palyra-daemon/src/cron.rs#56-56.
- Embeddings Backfill: Every 10 minutes, the system checks for un-vectorized memory items and processes them in batches of 64 crates/palyra-daemon/src/cron.rs#57-58.
- Skill Re-audit: Periodically re-validates the security posture of installed skills crates/palyra-daemon/src/cron.rs#54-54.