Governance Architecture & Data Flow
The governance system is centered in thepalyra-daemon within the usage_governance module. It sits between the Orchestrator and the Model Providers, intercepting requests to evaluate them against active policies.
Usage Data Flow Diagram
“Governance Flow: Request to Execution” Sources: crates/palyra-daemon/src/application/route_message/orchestration.rs#333-334, crates/palyra-daemon/src/usage_governance.rs#133-146, crates/palyra-daemon/src/usage_governance.rs#216-228Budget Policies & Enforcement
Policies are defined viaUsageBudgetPolicyRecord objects. They specify limits based on metrics (tokens, USD) and intervals (daily, monthly).
Policy Evaluation Logic
Theevaluate_budget_policies function determines the status of a request based on:
- Hard Limits: Immediate rejection of the request if the consumed or projected value exceeds the threshold.
- Soft Limits: Triggers notifications or alerts without blocking execution.
- Routing Overrides: Policies can force a
RoutingMode(e.g., switching fromEnforcedtoSuggestor forcing a cheaper model) when thresholds are approached crates/palyra-daemon/src/usage_governance.rs#108-123.
Routing Modes
The system supports three operational modes for governance:| Mode | Description |
|---|---|
Suggest | Provides recommendations in logs/UI but does not interfere with the run. |
DryRun | Calculates all outcomes and potential blocks but allows the run to proceed. |
Enforced | Actively blocks runs that violate hard budget limits or lack required approvals. |
Budget Override & Approval Flows
When a hard limit is reached, users can request an override. This creates anApprovalRecord in the journal with a specific usage-budget: subject prefix crates/palyra-daemon/src/usage_governance.rs#24-24.
- Request: Triggered via
request_usage_budget_overridein the daemon crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#126-129. - Approval: Handled by the standard Human-in-the-loop (HITL) system. Once approved, the
ApprovalDecisionis checked during the nextplan_usage_routingcall crates/palyra-daemon/src/usage_governance.rs#129-130.
Model CLI Commands
Thepalyra CLI provides tools to inspect and configure the model registry which feeds the governance engine.
| Command | Function |
|---|---|
models status | Displays current provider, default models, and failover status crates/palyra-cli/src/args/models.rs#5-10. |
models list | Lists all configured and discovered models with their cost/latency tiers crates/palyra-cli/src/args/models.rs#11-16. |
models explain | Provides a detailed breakdown of why a specific model was selected for a prompt, including budget outcomes crates/palyra-cli/src/args/models.rs#41-52. |
models test-connection | Validates provider credentials and measures latency crates/palyra-cli/src/args/models.rs#17-28. |
Web Console: Usage Section
TheUsageSection in the web console (apps/web/src/console/sections/UsageSection.tsx) serves as the primary observability surface for governance.
Key Components
- Metric Grid: Displays
Total tokens,Estimated cost, andAvg latencyapps/web/src/console/sections/UsageSection.tsx#95-121. - Insights: Surfaces “Routing Decisions” which explain why the daemon chose a particular model (e.g., “Budget allowed premium model” or “Failover to secondary”) apps/web/src/console/sections/UsageSection.tsx#45-52.
- Timeline: Visualizes consumption buckets (Hourly/Daily) apps/web/src/console/sections/UsageSection.tsx#180-188.
Usage Governance Daemon Module
Theusage_governance.rs module in the daemon is responsible for the heavy lifting of cost estimation and alert generation.
- Cost Estimation:
estimate_cost_for_modelusesUsagePricingRecordto calculate USD values for prompt and completion tokens crates/palyra-daemon/src/transport/http/handlers/console/usage.rs#13-18. - Alert Generation:
build_alert_candidatesscans for cost spikes (defaulting to >$0.50) or provider health issues crates/palyra-daemon/src/usage_governance.rs#23-23, crates/palyra-daemon/src/usage_governance.rs#195-205. - Smart Routing: The
SmartRoutingRuntimeConfigdetermines if the system should automatically route to cheaper models when the complexity score of a prompt is low crates/palyra-daemon/src/usage_governance.rs#53-66.