Model Providers, Auth Profiles, and Usage Governance

This page covers the configuration and management of Large Language Model (LLM) providers, the secure storage and rotation of their credentials via Auth Profiles, and the governance framework that handles smart routing, cost tracking, and budget enforcement.

Model Provider Configuration

Palyra supports multiple model providers through a unified registry. The system abstracts provider-specific APIs (OpenAI, Anthropic) into a common internal representation for chat completions, embeddings, and audio transcriptions.

Supported Provider Kinds

The daemon recognizes three primary provider types:

Deterministic: Local or mock providers used for testing or specific non-LLM logic crates/palyra-daemon/src/model_provider.rs#40-40.
OpenAiCompatible: Standard OpenAI API surface, also used for compatible services like Groq, Together, or local Ollama instances crates/palyra-daemon/src/model_provider.rs#41-41.
Anthropic: Native support for the Anthropic Messages API crates/palyra-daemon/src/model_provider.rs#42-42.

Registry and Model Metadata

The ModelProviderRegistryConfig manages a collection of ProviderRegistryEntryConfig (the “where” and “how” of a connection) and ProviderModelEntryConfig (the specific models available) crates/palyra-daemon/src/model_provider.rs#144-187.

Entity	Purpose	Key Fields
Provider Entry	Connection settings	`base_url`, `auth_profile_id`, `max_retries`, `circuit_breaker`
Model Entry	Model capabilities	`role` (Chat/Embed), `capabilities` (Vision/Tools), `metadata_source`

Sources: crates/palyra-daemon/src/model_provider.rs#37-187, crates/palyra-daemon/src/config/load.rs#23-28

Auth Profile Registry

The Auth Profile system separates identity and credentials from model configuration. This allows multiple agents or components to share a single set of credentials or rotate them without modifying the provider registry.

Credential Types and Storage

Credentials are never stored in plain text in the configuration files. Instead, they are persisted in the palyra-vault and referenced by a VaultRef crates/palyra-daemon/src/openai_surface.rs#42-48.

API Key: A static bearer token stored in the vault crates/palyra-auth/src/lib.rs#10-16.
OAuth2: Managed profiles that support the openid, profile, and offline_access scopes, allowing for automatic token refresh apps/web/src/console/hooks/useAuthDomain.ts#15-16.

Profile Scoping

Profiles can be scoped to limit where credentials can be used:

Global: Available to any run or agent apps/web/src/console/hooks/useAuthDomain.ts#25-25.
Agent: Restricted to a specific agent_id apps/web/src/console/hooks/useAuthDomain.ts#25-25.

Auth Data Flow (API Key Connection)

The following diagram illustrates the flow when a user connects a new OpenAI API key via the Web Console. Title: OpenAI API Key Connection Flow Sources: apps/web/src/console/sections/AuthSection.tsx#130-166, crates/palyra-daemon/src/openai_surface.rs#18-78, crates/palyra-daemon/tests/openai_auth_surface.rs#29-73

Usage Governance and Smart Routing

The governance subsystem tracks token consumption and applies “Smart Routing” logic to select the most cost-effective or performant model based on the complexity of the prompt.

Smart Routing Logic

The RoutingDecision is calculated by evaluating the complexity_score of a prompt against available model capabilities and health_state crates/palyra-daemon/src/usage_governance.rs#112-130.

Suggest Mode: Recommends a model but uses the default crates/palyra-daemon/src/usage_governance.rs#29-29.
Dry Run: Logs what would have happened crates/palyra-daemon/src/usage_governance.rs#30-30.
Enforced Mode: Overrides the requested model with the recommended one crates/palyra-daemon/src/usage_governance.rs#31-31.

Usage Budgets

The system enforces UsageBudgetPolicyRecord rules to prevent runaway costs crates/palyra-daemon/src/usage_governance.rs#11-13.

Metric	Enforcement Actions
Consumed Value (USD)	Soft Limit (Alert), Hard Limit (Block)
Token Count	Per Session, Per Agent, or Global

Cost Tracking Implementation

The UsageEnrichedRun structure joins OrchestratorUsageInsightsRunRecord with UsagePricingRecord to provide real-time USD estimates crates/palyra-daemon/src/usage_governance.rs#208-214. Title: Governance and Routing Architecture Sources: crates/palyra-daemon/src/usage_governance.rs#26-146, crates/palyra-daemon/src/journal.rs#8-13

OpenAI-Compatible Surface

The daemon exposes an OpenAI-compatible HTTP surface at /v1/*, allowing standard OpenAI SDKs and tools to interact with Palyra as if it were the upstream provider.

Key Handlers

/v1/chat/completions: Proxies requests to the internal orchestrator, applying Palyra’s routing and budget policies crates/palyra-daemon/src/model_provider.rs#19-19.
/v1/embeddings: Routes vectorization requests to the configured embedding provider crates/palyra-daemon/src/model_provider.rs#20-20.
/v1/audio/transcriptions: Handles Whisper-style audio processing crates/palyra-daemon/src/model_provider.rs#21-21.

Request Transformation

When a request hits the OpenAI-compatible surface, the ModelProvider translates the generic ProviderRequest into the specific format required by the target (e.g., converting OpenAI JSON to Anthropic XML/JSON structures) crates/palyra-daemon/src/model_provider.rs#211-216. Sources: crates/palyra-daemon/src/model_provider.rs#19-35, crates/palyra-daemon/src/openai_surface.rs#15-15

Maintenance and Background Tasks

The cron subsystem handles recurring tasks related to model health and credential maintenance.

Memory Maintenance: Runs every 5 minutes to prune or optimize local context crates/palyra-daemon/src/cron.rs#56-56.
Embeddings Backfill: Every 10 minutes, the system checks for un-vectorized memory items and processes them in batches of 64 crates/palyra-daemon/src/cron.rs#57-58.
Skill Re-audit: Periodically re-validates the security posture of installed skills crates/palyra-daemon/src/cron.rs#54-54.

Sources: crates/palyra-daemon/src/cron.rs#42-60

​Model Provider Configuration

​Supported Provider Kinds

​Registry and Model Metadata

​Auth Profile Registry

​Credential Types and Storage

​Profile Scoping

​Auth Data Flow (API Key Connection)

​Usage Governance and Smart Routing

​Smart Routing Logic

​Usage Budgets

​Cost Tracking Implementation

​OpenAI-Compatible Surface

​Key Handlers

​Request Transformation

​Maintenance and Background Tasks