1. Usage Governance and Budgeting
Palyra implements a multi-layered governance system to track and limit LLM usage costs. This system operates by intercepting orchestration requests and evaluating them against defined budget policies before dispatching them to model providers.1.1. Core Governance Entities
The governance logic is primarily implemented incrates/palyra-daemon/src/usage_governance.rs. Key structures include:
RoutingMode: Defines how governance is enforced. Options includeSuggest(passive),DryRun(logs outcome but allows execution), andEnforced(blocks execution if limits are exceeded) crates/palyra-daemon/src/usage_governance.rs#24-28.UsageBudgetEvaluation: Represents the result of checking a request against a specific policy, including consumed vs. projected values and soft/hard limits crates/palyra-daemon/src/usage_governance.rs#88-105.RoutingDecision: The final output of the governance engine, determining if a run is blocked or requires manual approval crates/palyra-daemon/src/usage_governance.rs#108-126.
1.2. Smart Routing and Cost Estimation
Before aRun begins, the system calculates a PricingEstimate based on the estimated token count of the prompt crates/palyra-daemon/src/usage_governance.rs#65-73.
The plan_usage_routing function is the central entry point for this logic. It is called during the run stream initialization to determine if the requested model and parameters align with the effective SmartRoutingRuntimeConfig crates/palyra-daemon/src/application/run_stream/orchestration.rs#31-32 crates/palyra-daemon/src/usage_governance.rs#50-54.
1.3. Usage Governance Flow
The following diagram illustrates how a message route request is governed before reaching the LLM. Diagram: Governance and Routing Flow Sources: crates/palyra-daemon/src/usage_governance.rs#201-213, crates/palyra-daemon/src/application/run_stream/orchestration.rs#146-1822. Diagnostics and System Doctor
Palyra provides deep introspection into the daemon’s state through gRPC and HTTP diagnostics endpoints, complemented by a CLI-based “Doctor” for environment repair.2.1. Console Diagnostics
The/console/v1/diagnostics handler aggregates snapshots from every major subsystem. It collects:
- Model Provider Status: Connectivity and circuit breaker state crates/palyra-daemon/src/transport/http/handlers/console/diagnostics.rs#11-15.
- Auth Profiles: Status of identity and credential providers crates/palyra-daemon/src/transport/http/handlers/console/diagnostics.rs#16-20.
- Memory Maintenance: Statistics on vector DB usage and TTL vacuuming schedules crates/palyra-daemon/src/transport/http/handlers/console/diagnostics.rs#54-56.
- Observability: Aggregated recent failures across connectors and browsers crates/palyra-daemon/src/transport/http/handlers/console/diagnostics.rs#62-63.
2.2. CLI Doctor and Recovery
Thepalyra doctor command is the primary tool for troubleshooting and repairing the local installation. It is implemented in crates/palyra-cli/src/commands/doctor/recovery.rs.
The doctor operates in several modes: Diagnostics, RepairPreview, and RepairApply crates/palyra-cli/src/commands/doctor/recovery.rs#68-74. It can perform automated fixes such as:
- Reinitializing missing configuration files crates/palyra-cli/src/commands/doctor/recovery.rs#179-181.
- Normalizing corrupted auth registries crates/palyra-cli/src/commands/doctor/recovery.rs#195-198.
- Backfilling missing access registry entries crates/palyra-cli/src/commands/doctor/recovery.rs#234-236.
3. Self-Healing and Background Loops
The self-healing system ensures that long-running background tasks and orchestration runs are monitored for “stalling” or silent failures.3.1. Work Heartbeats
The daemon tracks active work via theWorkHeartbeat mechanism. Any significant background operation must record heartbeats to avoid being flagged as an incident crates/palyra-daemon/src/self_healing.rs#1-10.
WorkHeartbeatKind: Categorizes the work (e.g.,Run,BackgroundTask,CronJob) crates/palyra-daemon/src/self_healing.rs#12-18.- Heartbeat Recording: Components call
record_self_healing_heartbeatperiodically during execution crates/palyra-daemon/src/background_queue.rs#107-111.
3.2. Background Queue Supervision
Thespawn_background_queue_loop in crates/palyra-daemon/src/background_queue.rs manages the lifecycle of asynchronous tasks. It monitors for:
- Expirations: Tasks that failed to start before their
expires_at_unix_mscrates/palyra-daemon/src/background_queue.rs#114-116. - Cancellations: Propagating parent run cancellations to child background tasks crates/palyra-daemon/src/background_queue.rs#145-147.
- Terminal State Finalization: Moving tasks to
completed,failed, orcancelledbased on the outcome of their target runs crates/palyra-daemon/src/background_queue.rs#157-158.
3.3. Self-Healing Incident Remediation
When a heartbeat is missed for a defined threshold, the self-healing logic can trigger remediation:- Run Cancellation: If an orchestration run hangs, the self-healing loop transitions it to
Cancelledand clears its heartbeat crates/palyra-daemon/src/application/run_stream/cancellation.rs#16-33. - Task Re-dispatch: Background tasks that stall in a
runningstate without heartbeats may be marked for retry or failure crates/palyra-daemon/src/background_queue.rs#185-190.
4. Web Console Integration
The Web Console provides a dedicated Operations Section for monitoring these systems.- Diagnostics View: Displays the model provider state, auth profile status, and browser service health apps/web/src/console/sections/OperationsSection.tsx#112-129.
- Self-Healing Dashboard: Visualizes active incidents, recent remediation attempts, and heartbeat status apps/web/src/console/sections/OperationsSection.tsx#143-147.
- Usage Insights: Summarizes total spending, model mix, and active usage alerts apps/web/src/console/sections/OperationsSection.tsx#137-141.