Skip to main content
Usage Governance in Palyra provides a multi-layered framework for tracking token consumption, estimating costs, and enforcing budget policies across different scopes (Session, Agent, or Principal). It integrates with the model provider registry to optimize routing based on cost and complexity while providing human-in-the-loop flows for budget overrides.

Governance Architecture & Data Flow

The governance system is centered in the palyra-daemon within the usage_governance module. It sits between the Orchestrator and the Model Providers, intercepting requests to evaluate them against active policies.

Usage Data Flow Diagram

“Governance Flow: Request to Execution” Sources: crates/palyra-daemon/src/application/route_message/orchestration.rs#333-334, crates/palyra-daemon/src/usage_governance.rs#133-146, crates/palyra-daemon/src/usage_governance.rs#216-228

Budget Policies & Enforcement

Policies are defined via UsageBudgetPolicyRecord objects. They specify limits based on metrics (tokens, USD) and intervals (daily, monthly).

Policy Evaluation Logic

The evaluate_budget_policies function determines the status of a request based on:
  1. Hard Limits: Immediate rejection of the request if the consumed or projected value exceeds the threshold.
  2. Soft Limits: Triggers notifications or alerts without blocking execution.
  3. Routing Overrides: Policies can force a RoutingMode (e.g., switching from Enforced to Suggest or forcing a cheaper model) when thresholds are approached crates/palyra-daemon/src/usage_governance.rs#108-123.

Routing Modes

The system supports three operational modes for governance:
ModeDescription
SuggestProvides recommendations in logs/UI but does not interfere with the run.
DryRunCalculates all outcomes and potential blocks but allows the run to proceed.
EnforcedActively blocks runs that violate hard budget limits or lack required approvals.
Sources: crates/palyra-daemon/src/usage_governance.rs#28-32, crates/palyra-daemon/src/usage_governance.rs#59-66

Budget Override & Approval Flows

When a hard limit is reached, users can request an override. This creates an ApprovalRecord in the journal with a specific usage-budget: subject prefix crates/palyra-daemon/src/usage_governance.rs#24-24. Sources: crates/palyra-daemon/src/usage_governance.rs#8-13, crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#13-18

Model CLI Commands

The palyra CLI provides tools to inspect and configure the model registry which feeds the governance engine.
CommandFunction
models statusDisplays current provider, default models, and failover status crates/palyra-cli/src/args/models.rs#5-10.
models listLists all configured and discovered models with their cost/latency tiers crates/palyra-cli/src/args/models.rs#11-16.
models explainProvides a detailed breakdown of why a specific model was selected for a prompt, including budget outcomes crates/palyra-cli/src/args/models.rs#41-52.
models test-connectionValidates provider credentials and measures latency crates/palyra-cli/src/args/models.rs#17-28.
Sources: crates/palyra-cli/src/commands/models.rs#194-210, crates/palyra-cli/tests/models_cli.rs#57-82

Web Console: Usage Section

The UsageSection in the web console (apps/web/src/console/sections/UsageSection.tsx) serves as the primary observability surface for governance.

Key Components

“Web Console Usage Data Mapping” Sources: apps/web/src/console/sections/UsageSection.tsx#34-53, crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#31-40, crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#161-168

Usage Governance Daemon Module

The usage_governance.rs module in the daemon is responsible for the heavy lifting of cost estimation and alert generation. Sources: crates/palyra-daemon/src/usage_governance.rs#1-25, crates/palyra-daemon/src/usage_governance.rs#112-130