Skip to main content
The UsageGovernance subsystem is responsible for tracking token consumption, estimating financial costs, and enforcing budget policies across the Palyra platform. It provides the mechanism for “Smart Routing” based on cost/complexity and manages the lifecycle of budget overrides through human-in-the-loop approvals.

Architecture Overview

Usage governance sits between the Gateway orchestration layer and the Model Providers. Every LLM request (Run) triggers a routing plan evaluation that checks against defined UsageBudgetPolicyRecord entries.

Key Components

Data Flow: Request Routing and Budgeting

The following diagram illustrates how a message request is processed through the governance layer. Usage Governance Data Flow Sources: crates/palyra-daemon/src/application/route_message/orchestration.rs#33-34, crates/palyra-daemon/src/usage_governance.rs#201-213, crates/palyra-daemon/src/usage_governance.rs#108-126

Smart Routing and Enforcement Modes

Routing behavior is governed by the RoutingMode enum, which determines how strictly budget and model recommendations are applied.
ModeBehavior
SuggestRecords recommendations but uses the default model. Does not block for budgets crates/palyra-daemon/src/usage_governance.rs#33.
DryRunSimulates enforcement in logs but does not interrupt the flow crates/palyra-daemon/src/usage_governance.rs#34.
EnforcedStrictly applies model overrides and blocks execution if budgets are exceeded crates/palyra-daemon/src/usage_governance.rs#35.
Sources: crates/palyra-daemon/src/usage_governance.rs#24-47

Implementation Details

Usage Tracking Entities

The system maps high-level usage concepts to specific Rust structs used for persistence and API transport. Governance Entity Mapping Sources: crates/palyra-daemon/src/usage_governance.rs#88-105, crates/palyra-daemon/src/usage_governance.rs#64-73, apps/web/src/consoleApi.ts#245-260

Budget Evaluation Logic

The evaluate_budget_policies function (called via plan_usage_routing) performs the following steps:
  1. Retrieves consumption totals from the JournalStore for the relevant scope_id (e.g., a specific session or principal).
  2. Calculates the projected_value by adding the current request’s prompt_tokens_estimate to the consumed total crates/palyra-daemon/src/usage_governance.rs#137.
  3. Compares totals against soft_limit_value (triggers alerts) and hard_limit_value (triggers blocks) crates/palyra-daemon/src/usage_governance.rs#101-103.

Approval Workflow for Overrides

When a user encounters a budget block, they can request an override. This is handled by the request_usage_budget_override function.
  1. Request: The client calls the /console/v1/usage/budget/policy/:id/override endpoint crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#126-129.
  2. Creation: The system creates an ApprovalRecord with a subject ID prefixed by usage-budget: crates/palyra-daemon/src/usage_governance.rs#20.
  3. Human Intervention: An administrator reviews the request in the “Approvals” section of the Web Console.
  4. Resolution: Once approved, the evaluate_budget_policies logic checks for valid ApprovalDecision entries that cover the current time window, allowing the run to proceed despite the limit crates/palyra-daemon/src/usage_governance.rs#15-16.
Sources: crates/palyra-daemon/src/usage_governance.rs#228-230, crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#132-144

Console Integration

The Web Console provides a dedicated “Usage” domain for visualizing these controls. Sources: apps/web/src/console/hooks/useUsageDomain.ts#72-83, apps/web/src/consoleApi.ts#124-130