Skip to main content
This page details the mechanisms Palyra employs to track resource consumption, enforce economic and operational boundaries, and manage identity-based access across its HTTP and gRPC surfaces. Governance is implemented through a combination of token-based authentication, a hierarchical access registry, and a budget evaluation engine integrated into the orchestrator loop.

Usage Tracking and Budgets

Palyra tracks usage at the granularity of individual “runs” within sessions. This data is persisted in the OrchestratorUsage tables within the SQLite journal store and is used to evaluate budget policies in real-time.

Budget Evaluation Engine

The budget system evaluates consumption against defined UsageBudgetPolicyRecord entries. Policies can target specific metrics (e.g., USD cost, token count) over defined intervals (daily, monthly) crates/palyra-daemon/src/usage_governance.rs#88-105. When the orchestrator prepares a run, it invokes evaluate_budget_policies to determine if the request exceeds soft or hard limits crates/palyra-daemon/src/usage_governance.rs#201-213.
ComponentRole
UsageBudgetEvaluationRepresents the result of checking a single policy against current consumption crates/palyra-daemon/src/usage_governance.rs#88-105.
RoutingModeDefines enforcement: Suggest (log only), DryRun (simulate), or Enforced (block requests) crates/palyra-daemon/src/usage_governance.rs#24-28.
PricingEstimateCalculates projected USD costs based on model-specific pricing records crates/palyra-daemon/src/usage_governance.rs#65-73.

Smart Routing and Complexity Scoring

Usage governance also influences model selection. The RoutingDecision logic analyzes prompt complexity and provider health to recommend the most cost-effective model that meets the task’s requirements crates/palyra-daemon/src/usage_governance.rs#108-126. Sources: crates/palyra-daemon/src/usage_governance.rs#24-126, crates/palyra-daemon/src/usage_governance.rs#201-213.

Access Control and API Tokens

The access_control module manages the AccessRegistry, a JSON-backed store that defines feature flags, API tokens, and workspace memberships crates/palyra-daemon/src/access_control.rs#1-13.

Identity and Permissions

Palyra uses a Role-Based Access Control (RBAC) model within workspaces. Roles include Owner, Admin, and Operator crates/palyra-daemon/src/access_control.rs#76-80.

API Token Lifecycle

Tokens are stored as ApiTokenRecord entries, which include a SHA-256 hash of the secret, associated scopes, and rate limit configurations crates/palyra-daemon/src/access_control.rs#151-173. Tokens are validated during request interception in the HTTP/gRPC layers. Access Control Data Flow Title: Access Registry and Token Validation Flow Sources: crates/palyra-daemon/src/access_control.rs#151-173, crates/palyra-daemon/src/transport/http/handlers/compat.rs#128-142.

OpenAI-Compatible Auth Surface

Palyra provides an OpenAI-compatible authentication surface to allow standard LLM clients to connect to the daemon. This is handled primarily in openai_surface.rs and openai_auth.rs.

API Key and OAuth Integration

The daemon supports both static API keys and OAuth2 flows for model providers (specifically OpenAI).

Credential Storage

Credentials are never stored in plain text in the configuration. Instead, the AuthProfileRecord stores a VaultRef crates/palyra-daemon/src/openai_surface.rs#35-41. This ensures that sensitive tokens are only decrypted in memory when required for a provider request. OAuth Sequence Title: OpenAI OAuth PKCE Sequence Sources: crates/palyra-daemon/src/openai_auth.rs#99-130, crates/palyra-daemon/src/openai_surface.rs#144-167.

Implementation Details

Key Files and Structures

Rate Limiting

Rate limiting is enforced per API token. The enforce_compat_rate_limit function checks the rate_limit_per_minute defined in the ApiTokenRecord against a sliding window of requests tracked in the CompatApiRateLimitEntry crates/palyra-daemon/src/transport/http/handlers/compat.rs#111-111.

Usage Reporting

The Console API provides endpoints to query usage totals and timelines, which are used by the web dashboard to render consumption charts crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#161-168.
MetricSource
prompt_tokensEstimated via estimate_token_count or reported by provider crates/palyra-daemon/src/usage_governance.rs#137-137.
estimated_cost_usdCalculated using UsagePricingRecord and token counts crates/palyra-daemon/src/usage_governance.rs#160-160.
complexity_scoreDerived from prompt length and requested capabilities crates/palyra-daemon/src/usage_governance.rs#117-117.
Sources: crates/palyra-daemon/src/usage_governance.rs#1-126, crates/palyra-daemon/src/access_control.rs#76-173, crates/palyra-daemon/src/transport/http/handlers/compat.rs#1-126.