palyra-safety crate defines the security primitives used to inspect, classify, and transform content at the boundaries of the Palyra ecosystem. It provides a fail-closed architecture for detecting prompt injection, identifying secret leaks, and enforcing trust-based redaction before data reaches a Model Provider or is exported to a user.
Safety Phases and Data Flow
Content inspection occurs at three primary lifecycle stages, defined by theSafetyPhase enum:
- PrePrompt: Content is scanned before being assembled into an LLM prompt. This phase focuses on detecting prompt injection and indirect secret references in untrusted data.
- PreExecution: Content (typically tool outputs) is scanned before the agent processes it.
- Export: Content is scanned before being returned to the user or saved to logs. This phase emphasizes the redaction of actual credentials and internal system paths.
System Data Flow and Safety Interception
The following diagram illustrates how thepalyra-safety primitives intercept data moving between the agent loop, the filesystem, and external LLM providers.
Title: Safety Boundary Interception Points
Sources: crates/palyra-safety/src/lib.rs#1-9, crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#1-14, crates/palyra-daemon/src/application/tool_runtime/workspace_patch.rs#1-12
Content Inspection and Detection
The core of the safety system isinspect_text, which classifies content into SafetyFindingCategory buckets and recommends a SafetyAction.
Prompt Injection Detection
Palyra uses a set ofPatternRule definitions to identify common injection techniques. These are matched against normalized (whitespace-collapsed, lowercase) text.
| Rule Category | Examples | Severity |
|---|---|---|
| Instruction Level | ”ignore previous instructions”, “you are now [role]“ | High / Warning |
| Exfiltration | ”reveal the system prompt”, “print secret” | Critical |
| Spoofing | <system>, [system] | High / Warning |
Secret Leak Scanning
The system identifies two types of sensitive material:- Credential Material: Actual tokens, keys, or passwords.
- Credential References: Indirect markers like
vault_refor*_filepaths.
is_sensitive_key function in palyra-common provides the heuristic backbone for identifying these keys (e.g., access_token, api_key, client_secret).
Sources: crates/palyra-common/src/redaction.rs#15-39, crates/palyra-safety/src/lib.rs#190-205
Trust Labels and Safety Actions
Every piece of content is associated with aTrustLabel, which dictates how strictly it is inspected:
- Trusted: System-generated content or verified local files.
- Untrusted: Data from external URLs, user input, or unverified tool outputs.
SafetyPhase, the system returns a SafetyAction:
| Action | Description |
|---|---|
| Allow | Content is safe to pass through. |
| Annotate | Content is allowed but marked with a warning finding. |
| Redact | Sensitive portions are replaced with <redacted>. |
| Block | Content is rejected entirely (fail-closed). |
Redaction Utilities
Redaction is implemented across several layers to ensure consistency.Text Redaction
Theredact_text_for_export function applies the Redact action by substituting sensitive values with the REDACTED constant ("<redacted>"). This includes:
- Header Redaction: Masking sensitive HTTP headers like
AuthorizationorSet-Cookiecrates/palyra-common/src/redaction.rs#131-143. - URL Redaction: Removing credentials from userinfo and sensitive keys from query parameters crates/palyra-common/src/redaction.rs#156-179.
- Internal Path Redaction: Masking internal storage paths like
.palyra/sessionsusingREDACTED_INTERNAL_RUNTIME_PATHcrates/palyra-common/src/redaction.rs#203-211.
Filesystem Redaction (Workspace Tools)
Workspace tools likepalyra.fs.read_file and palyra.fs.search integrate safety scanning directly into their output pipeline.
Title: Workspace Tool Redaction Flow
If a file read is redacted, the WorkspaceReadFileOutput sets redacted: true and text_authoritative: false to prevent the agent from accidentally overwriting the file with the redacted placeholder crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#118-149.
Sources: crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#8-14, crates/palyra-common/src/redaction.rs#11-13
Fail-Closed Design in Patching
Thepalyra.fs.apply_patch tool implements a strict fail-closed policy. Before any mutation occurs:
- Dry Run: The patch is planned and validated crates/palyra-daemon/src/application/tool_runtime/workspace_patch.rs#6-8.
- Secret File Protection: Patches are rejected if they attempt to insert redaction placeholders (like
[redacted]) into known secret-bearing files (e.g.,.env,id_rsa) crates/palyra-common/src/workspace_patch.rs#164-167. - Containment: All paths are canonicalized and checked against allowed workspace roots. If a path escapes the root, the entire patch fails crates/palyra-common/src/workspace_patch.rs#150-151.