UsageGovernance subsystem is responsible for tracking token consumption, estimating financial costs, and enforcing budget policies across the Palyra platform. It provides the mechanism for “Smart Routing” based on cost/complexity and manages the lifecycle of budget overrides through human-in-the-loop approvals.
Architecture Overview
Usage governance sits between the Gateway orchestration layer and the Model Providers. Every LLM request (Run) triggers a routing plan evaluation that checks against definedUsageBudgetPolicyRecord entries.
Key Components
- Token Tracking: Real-time estimation of prompt tokens and recording of actual completion tokens in the
JournalStore. - Cost Estimation: Calculation of
estimated_cost_usdusingUsagePricingRecordsnapshots crates/palyra-daemon/src/usage_governance.rs#11-13. - Budget Evaluation: The
UsageBudgetEvaluationstruct tracks consumption against soft and hard limits crates/palyra-daemon/src/usage_governance.rs#88-105. - Approval Workflow: When a hard limit is hit, the system can transition to a
requestUsageBudgetOverridestate, requiring anApprovalRecordto proceed crates/palyra-daemon/src/usage_governance.rs#228-230.
Data Flow: Request Routing and Budgeting
The following diagram illustrates how a message request is processed through the governance layer. Usage Governance Data Flow Sources: crates/palyra-daemon/src/application/route_message/orchestration.rs#33-34, crates/palyra-daemon/src/usage_governance.rs#201-213, crates/palyra-daemon/src/usage_governance.rs#108-126Smart Routing and Enforcement Modes
Routing behavior is governed by theRoutingMode enum, which determines how strictly budget and model recommendations are applied.
| Mode | Behavior |
|---|---|
Suggest | Records recommendations but uses the default model. Does not block for budgets crates/palyra-daemon/src/usage_governance.rs#33. |
DryRun | Simulates enforcement in logs but does not interrupt the flow crates/palyra-daemon/src/usage_governance.rs#34. |
Enforced | Strictly applies model overrides and blocks execution if budgets are exceeded crates/palyra-daemon/src/usage_governance.rs#35. |
Implementation Details
Usage Tracking Entities
The system maps high-level usage concepts to specific Rust structs used for persistence and API transport. Governance Entity Mapping Sources: crates/palyra-daemon/src/usage_governance.rs#88-105, crates/palyra-daemon/src/usage_governance.rs#64-73, apps/web/src/consoleApi.ts#245-260Budget Evaluation Logic
Theevaluate_budget_policies function (called via plan_usage_routing) performs the following steps:
- Retrieves consumption totals from the
JournalStorefor the relevantscope_id(e.g., a specific session or principal). - Calculates the
projected_valueby adding the current request’sprompt_tokens_estimateto the consumed total crates/palyra-daemon/src/usage_governance.rs#137. - Compares totals against
soft_limit_value(triggers alerts) andhard_limit_value(triggers blocks) crates/palyra-daemon/src/usage_governance.rs#101-103.
Approval Workflow for Overrides
When a user encounters a budget block, they can request an override. This is handled by therequest_usage_budget_override function.
- Request: The client calls the
/console/v1/usage/budget/policy/:id/overrideendpoint crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#126-129. - Creation: The system creates an
ApprovalRecordwith a subject ID prefixed byusage-budget:crates/palyra-daemon/src/usage_governance.rs#20. - Human Intervention: An administrator reviews the request in the “Approvals” section of the Web Console.
- Resolution: Once approved, the
evaluate_budget_policieslogic checks for validApprovalDecisionentries that cover the current time window, allowing the run to proceed despite the limit crates/palyra-daemon/src/usage_governance.rs#15-16.
Console Integration
The Web Console provides a dedicated “Usage” domain for visualizing these controls.useUsageDomain: A React hook that orchestrates fetchingUsageSummaryEnvelope,UsageAgentsEnvelope, andUsageModelsEnvelopeapps/web/src/console/hooks/useUsageDomain.ts#23-35.ConsoleApiClient: Implements methods likegetUsageSummaryandrequestUsageBudgetOverrideto interface with the daemon’s HTTP handlers apps/web/src/consoleApi.ts#80-162.- Timeline Visualization: Data is aggregated into
UsageTimelineBucketobjects for rendering cost and token charts apps/web/src/consoleApi.ts#110-122.