Usage Tracking and Budgets
Palyra tracks usage at the granularity of individual “runs” within sessions. This data is persisted in theOrchestratorUsage tables within the SQLite journal store and is used to evaluate budget policies in real-time.
Budget Evaluation Engine
The budget system evaluates consumption against definedUsageBudgetPolicyRecord entries. Policies can target specific metrics (e.g., USD cost, token count) over defined intervals (daily, monthly) crates/palyra-daemon/src/usage_governance.rs#88-105.
When the orchestrator prepares a run, it invokes evaluate_budget_policies to determine if the request exceeds soft or hard limits crates/palyra-daemon/src/usage_governance.rs#201-213.
| Component | Role |
|---|---|
UsageBudgetEvaluation | Represents the result of checking a single policy against current consumption crates/palyra-daemon/src/usage_governance.rs#88-105. |
RoutingMode | Defines enforcement: Suggest (log only), DryRun (simulate), or Enforced (block requests) crates/palyra-daemon/src/usage_governance.rs#24-28. |
PricingEstimate | Calculates projected USD costs based on model-specific pricing records crates/palyra-daemon/src/usage_governance.rs#65-73. |
Smart Routing and Complexity Scoring
Usage governance also influences model selection. TheRoutingDecision logic analyzes prompt complexity and provider health to recommend the most cost-effective model that meets the task’s requirements crates/palyra-daemon/src/usage_governance.rs#108-126.
Sources: crates/palyra-daemon/src/usage_governance.rs#24-126, crates/palyra-daemon/src/usage_governance.rs#201-213.
Access Control and API Tokens
Theaccess_control module manages the AccessRegistry, a JSON-backed store that defines feature flags, API tokens, and workspace memberships crates/palyra-daemon/src/access_control.rs#1-13.
Identity and Permissions
Palyra uses a Role-Based Access Control (RBAC) model within workspaces. Roles includeOwner, Admin, and Operator crates/palyra-daemon/src/access_control.rs#76-80.
- Owner: Full control, including API token management and trust operations crates/palyra-daemon/src/access_control.rs#114-122.
- Admin: Management of memberships and sharing crates/palyra-daemon/src/access_control.rs#124-130.
- Operator: Standard usage of sessions, memory, and routines crates/palyra-daemon/src/access_control.rs#131-132.
API Token Lifecycle
Tokens are stored asApiTokenRecord entries, which include a SHA-256 hash of the secret, associated scopes, and rate limit configurations crates/palyra-daemon/src/access_control.rs#151-173. Tokens are validated during request interception in the HTTP/gRPC layers.
Access Control Data Flow
Title: Access Registry and Token Validation Flow
Sources: crates/palyra-daemon/src/access_control.rs#151-173, crates/palyra-daemon/src/transport/http/handlers/compat.rs#128-142.
OpenAI-Compatible Auth Surface
Palyra provides an OpenAI-compatible authentication surface to allow standard LLM clients to connect to the daemon. This is handled primarily inopenai_surface.rs and openai_auth.rs.
API Key and OAuth Integration
The daemon supports both static API keys and OAuth2 flows for model providers (specifically OpenAI).- API Key Connection: Validates the key against the provider’s
/v1/modelsendpoint before persisting it in the vault crates/palyra-daemon/src/openai_surface.rs#11-41. - OAuth Flow: Implements PKCE (Proof Key for Code Exchange) to securely exchange authorization codes for access and refresh tokens crates/palyra-daemon/src/openai_auth.rs#99-130.
Credential Storage
Credentials are never stored in plain text in the configuration. Instead, theAuthProfileRecord stores a VaultRef crates/palyra-daemon/src/openai_surface.rs#35-41. This ensures that sensitive tokens are only decrypted in memory when required for a provider request.
OAuth Sequence
Title: OpenAI OAuth PKCE Sequence
Sources: crates/palyra-daemon/src/openai_auth.rs#99-130, crates/palyra-daemon/src/openai_surface.rs#144-167.
Implementation Details
Key Files and Structures
crates/palyra-daemon/src/usage_governance.rs: Core logic for budget evaluation, pricing estimation, and smart routing decisions.crates/palyra-daemon/src/access_control.rs: Implementation of theAccessRegistryand RBAC permission sets crates/palyra-daemon/src/access_control.rs#103-134.crates/palyra-daemon/src/transport/http/handlers/compat.rs: Entry point for OpenAI-compatible requests, enforcing token authorization and rate limiting crates/palyra-daemon/src/transport/http/handlers/compat.rs#105-126.
Rate Limiting
Rate limiting is enforced per API token. Theenforce_compat_rate_limit function checks the rate_limit_per_minute defined in the ApiTokenRecord against a sliding window of requests tracked in the CompatApiRateLimitEntry crates/palyra-daemon/src/transport/http/handlers/compat.rs#111-111.
Usage Reporting
The Console API provides endpoints to query usage totals and timelines, which are used by the web dashboard to render consumption charts crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#161-168.| Metric | Source |
|---|---|
prompt_tokens | Estimated via estimate_token_count or reported by provider crates/palyra-daemon/src/usage_governance.rs#137-137. |
estimated_cost_usd | Calculated using UsagePricingRecord and token counts crates/palyra-daemon/src/usage_governance.rs#160-160. |
complexity_score | Derived from prompt length and requested capabilities crates/palyra-daemon/src/usage_governance.rs#117-117. |