Skip to main content
The palyra-safety crate defines the security primitives used to inspect, classify, and transform content at the boundaries of the Palyra ecosystem. It provides a fail-closed architecture for detecting prompt injection, identifying secret leaks, and enforcing trust-based redaction before data reaches a Model Provider or is exported to a user.

Safety Phases and Data Flow

Content inspection occurs at three primary lifecycle stages, defined by the SafetyPhase enum:
  1. PrePrompt: Content is scanned before being assembled into an LLM prompt. This phase focuses on detecting prompt injection and indirect secret references in untrusted data.
  2. PreExecution: Content (typically tool outputs) is scanned before the agent processes it.
  3. Export: Content is scanned before being returned to the user or saved to logs. This phase emphasizes the redaction of actual credentials and internal system paths.

System Data Flow and Safety Interception

The following diagram illustrates how the palyra-safety primitives intercept data moving between the agent loop, the filesystem, and external LLM providers. Title: Safety Boundary Interception Points Sources: crates/palyra-safety/src/lib.rs#1-9, crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#1-14, crates/palyra-daemon/src/application/tool_runtime/workspace_patch.rs#1-12

Content Inspection and Detection

The core of the safety system is inspect_text, which classifies content into SafetyFindingCategory buckets and recommends a SafetyAction.

Prompt Injection Detection

Palyra uses a set of PatternRule definitions to identify common injection techniques. These are matched against normalized (whitespace-collapsed, lowercase) text.
Rule CategoryExamplesSeverity
Instruction Level”ignore previous instructions”, “you are now [role]“High / Warning
Exfiltration”reveal the system prompt”, “print secret”Critical
Spoofing<system>, [system]High / Warning
Sources: crates/palyra-safety/src/lib.rs#16-188

Secret Leak Scanning

The system identifies two types of sensitive material:
  1. Credential Material: Actual tokens, keys, or passwords.
  2. Credential References: Indirect markers like vault_ref or *_file paths.
The is_sensitive_key function in palyra-common provides the heuristic backbone for identifying these keys (e.g., access_token, api_key, client_secret). Sources: crates/palyra-common/src/redaction.rs#15-39, crates/palyra-safety/src/lib.rs#190-205

Trust Labels and Safety Actions

Every piece of content is associated with a TrustLabel, which dictates how strictly it is inspected:
  • Trusted: System-generated content or verified local files.
  • Untrusted: Data from external URLs, user input, or unverified tool outputs.
Based on the inspection findings and the SafetyPhase, the system returns a SafetyAction:
ActionDescription
AllowContent is safe to pass through.
AnnotateContent is allowed but marked with a warning finding.
RedactSensitive portions are replaced with <redacted>.
BlockContent is rejected entirely (fail-closed).
Sources: crates/palyra-safety/src/lib.rs#32-33, crates/palyra-daemon/src/domain/workspace.rs#7-9

Redaction Utilities

Redaction is implemented across several layers to ensure consistency.

Text Redaction

The redact_text_for_export function applies the Redact action by substituting sensitive values with the REDACTED constant ("<redacted>"). This includes:

Filesystem Redaction (Workspace Tools)

Workspace tools like palyra.fs.read_file and palyra.fs.search integrate safety scanning directly into their output pipeline. Title: Workspace Tool Redaction Flow If a file read is redacted, the WorkspaceReadFileOutput sets redacted: true and text_authoritative: false to prevent the agent from accidentally overwriting the file with the redacted placeholder crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#118-149. Sources: crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#8-14, crates/palyra-common/src/redaction.rs#11-13

Fail-Closed Design in Patching

The palyra.fs.apply_patch tool implements a strict fail-closed policy. Before any mutation occurs:
  1. Dry Run: The patch is planned and validated crates/palyra-daemon/src/application/tool_runtime/workspace_patch.rs#6-8.
  2. Secret File Protection: Patches are rejected if they attempt to insert redaction placeholders (like [redacted]) into known secret-bearing files (e.g., .env, id_rsa) crates/palyra-common/src/workspace_patch.rs#164-167.
  3. Containment: All paths are canonicalized and checked against allowed workspace roots. If a path escapes the root, the entire patch fails crates/palyra-common/src/workspace_patch.rs#150-151.
Sources: crates/palyra-common/src/workspace_patch.rs#1-7, crates/palyra-daemon/src/application/tool_runtime/workspace_patch.rs#10-12