Skip to main content
Worker Fleet Management provides a fail-closed system for coordinating distributed execution across networked worker nodes. It ensures that any worker participating in the fleet is verified through cryptographic attestation, bound by strict egress policies, and managed through a state-controlled lease lifecycle.

WorkerFleetManager Ledger

The WorkerFleetManager serves as the in-memory ledger within the daemon. It tracks the availability, health, and current assignments of all registered workers.
  • Registration: Workers must present a WorkerAttestation to join the fleet. The manager validates this against the WorkerFleetPolicy crates/palyra-workerd/src/lib.rs#3-6.
  • State Tracking: It manages the WorkerLifecycleState, transitioning workers between states such as Available, Busy, Quarantined, or Orphaned crates/palyra-workerd/src/lib.rs#10-11.
  • Capacity Management: It enforces limits defined in the policy, such as the maximum number of concurrent workers or leases per worker.

Worker Lifecycle State Machine

The following diagram illustrates the transitions managed by the WorkerFleetManager. “Worker Lifecycle State Machine” Sources: crates/palyra-workerd/src/lib.rs#1-11, crates/palyra-workerd/src/lib.rs#73-91

Attestation and Verification

Attestation is the process by which a worker proves its identity and integrity to the daemon. The WorkerAttestation struct contains claims about the worker’s environment and build crates/palyra-workerd/src/lib.rs#35-62.

Key Attestation Fields

FieldDescription
worker_idUnique identifier (ULID) for the worker crates/palyra-workerd/src/lib.rs#37.
image_digest_sha256SHA-256 hash of the container/VM image crates/palyra-workerd/src/lib.rs#38.
egress_proxy_attestedBoolean indicating if the worker is bound to a verified egress proxy crates/palyra-workerd/src/lib.rs#42.
wit_abi_versionThe Wasm Interface Type ABI version supported by the worker crates/palyra-workerd/src/lib.rs#52-54.
Verification is performed by WorkerAttestation::validate against a WorkerAttestationExpectation crates/palyra-workerd/src/lib.rs#113-117. The system fails closed if any digest (image, build, or artifact) mismatches or if the egress proxy binding is missing when required crates/palyra-workerd/src/lib.rs#127-151. Sources: crates/palyra-workerd/src/lib.rs#35-153, crates/palyra-workerd/tests/critical_attack_scenarios.rs#73-91

Node RPC and mTLS Communication

Networked workers (and other remote nodes) communicate with the daemon via the NodeService gRPC interface crates/palyra-daemon/src/node_rpc.rs#1-4.

mTLS Enforcement

The NodeRpcServiceImpl enforces Mutual TLS (mTLS) to ensure only paired devices can communicate.
  1. Fingerprint Extraction: The daemon extracts the SHA-256 fingerprint of the client certificate from the TlsConnectInfo crates/palyra-daemon/src/node_rpc.rs#80-109.
  2. Revocation Check: The fingerprint is checked against the IdentityManager to ensure it hasn’t been revoked crates/palyra-daemon/src/node_rpc.rs#115-119.
  3. Device Binding: The daemon ensures the device_id in the request matches the ID bound to that specific certificate fingerprint crates/palyra-daemon/src/node_rpc.rs#125-155.
“Node RPC Authentication Flow” Sources: crates/palyra-daemon/src/node_rpc.rs#80-155, crates/palyra-daemon/src/node_runtime.rs#161-170, crates/palyra-daemon/tests/node_rpc_mtls.rs#76-96

Lease Lifecycle and Quarantine

When an agent requires a networked tool execution, it requests a lease from the WorkerFleetManager.
  1. Grant Authorization: An approval grant (WorkerApprovalGrant) must exist for the specific run_id crates/palyra-workerd/src/lib.rs#173-177.
  2. Assignment: The manager selects an Available worker and transitions it to Busy.
  3. Workspace Scoping: The lease includes a WorkerWorkspaceScope defining the root directory and allowed paths for the worker crates/palyra-workerd/src/lib.rs#157-162.
  4. Quarantine/Orphan Handling:
    • Quarantine: If a worker crashes or violates security policy (e.g., unauthorized network access detected by the egress proxy), it is placed in Quarantined state and cannot be reassigned until an operator intervenes crates/palyra-workerd/src/lib.rs#5-6.
    • Orphan: If a worker stops sending heartbeats, it is marked Orphaned.
Sources: crates/palyra-workerd/src/lib.rs#1-15, crates/palyra-workerd/src/lib.rs#157-177

Egress Proxy Integration

Networked workers must boot bound to an attested egress proxy. The EgressProxyPolicyService provides the logic used to validate outbound requests crates/palyra-egress-proxy/src/lib.rs#3-7. Sources: crates/palyra-egress-proxy/src/lib.rs#1-48, crates/palyra-egress-proxy/src/lib.rs#127-150

Node Runtime and Pairing

The NodeRuntimeState manages the persistence of node metadata and pairing requests in node-runtime.v1.json crates/palyra-daemon/src/node_runtime.rs#4-9. “Code Entities: Node Management”
System ConceptCode EntityFile
Node LedgerPersistedNodeRuntimeStatecrates/palyra-daemon/src/node_runtime.rs#171-181
Pairing RequestDevicePairingRequestRecordcrates/palyra-daemon/src/node_runtime.rs#134-149
mTLS ImplementationNodeRpcServiceImplcrates/palyra-daemon/src/node_rpc.rs#56-61
Capability DispatchCapabilityRequestRecordcrates/palyra-daemon/src/node_runtime.rs#180
Console Handlerconsole_nodes_list_handlercrates/palyra-daemon/src/transport/http/handlers/console/nodes.rs#55-58
Sources: crates/palyra-daemon/src/node_runtime.rs#1-181, crates/palyra-daemon/src/transport/http/handlers/console/nodes.rs#55-63