> ## Documentation Index
> Fetch the complete documentation index at: https://docs-code.palyra.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Safety Boundary and Content Inspection

<details>
  <summary>Relevant source files</summary>

  The following files were used as context for generating this wiki page:

  * crates/palyra-common/src/redaction.rs
  * crates/palyra-common/src/workspace\_patch.rs
  * crates/palyra-daemon/src/application/tool\_runtime/os\_file.rs
  * crates/palyra-daemon/src/application/tool\_runtime/workspace\_file.rs
  * crates/palyra-daemon/src/application/tool\_runtime/workspace\_patch.rs
  * crates/palyra-daemon/src/application/tool\_runtime/workspace\_scope.rs
  * crates/palyra-daemon/src/domain/workspace.rs
  * crates/palyra-safety/src/lib.rs
</details>

The `palyra-safety` crate defines the security primitives used to inspect, classify, and transform content at the boundaries of the Palyra ecosystem. It provides a fail-closed architecture for detecting prompt injection, identifying secret leaks, and enforcing trust-based redaction before data reaches a Model Provider or is exported to a user.

## Safety Phases and Data Flow

Content inspection occurs at three primary lifecycle stages, defined by the `SafetyPhase` enum:

1. **PrePrompt**: Content is scanned before being assembled into an LLM prompt. This phase focuses on detecting prompt injection and indirect secret references in untrusted data.
2. **PreExecution**: Content (typically tool outputs) is scanned before the agent processes it.
3. **Export**: Content is scanned before being returned to the user or saved to logs. This phase emphasizes the redaction of actual credentials and internal system paths.

### System Data Flow and Safety Interception

The following diagram illustrates how the `palyra-safety` primitives intercept data moving between the agent loop, the filesystem, and external LLM providers.

**Title: Safety Boundary Interception Points**

```mermaid theme={null}
graph TD
    subgraph "Natural Language Space"
        UserMsg["User Message"]
        ModelResp["Model Response"]
    end

    subgraph "Code Entity Space (palyra-daemon)"
        Orchestrator["AgentRunLoopState"]
        ToolRuntime["ToolRuntimeExecutionContext"]
        WorkspaceTool["workspace_file.rs"]
        PatchTool["workspace_patch.rs"]
    end

    subgraph "Safety Boundary (palyra-safety)"
        InspectText["inspect_text()"]
        RedactExport["redact_text_for_export()"]
        TransformPrompt["transform_text_for_prompt()"]
    end

    subgraph "External / Disk"
        LLM["Model Provider"]
        FS["Filesystem (Workspace)"]
    end

    UserMsg --> Orchestrator
    Orchestrator --> TransformPrompt
    TransformPrompt --> LLM
    LLM --> Orchestrator
    Orchestrator --> ToolRuntime
    ToolRuntime --> WorkspaceTool
    WorkspaceTool --> FS
    FS --> WorkspaceTool
    WorkspaceTool --> InspectText
    InspectText --> RedactExport
    RedactExport --> Orchestrator
    Orchestrator --> ModelResp
```

**Sources:** [crates/palyra-safety/src/lib.rs#1-9](http://crates/palyra-safety/src/lib.rs#1-9), [crates/palyra-daemon/src/application/tool\_runtime/workspace\_file.rs#1-14](http://crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#1-14), [crates/palyra-daemon/src/application/tool\_runtime/workspace\_patch.rs#1-12](http://crates/palyra-daemon/src/application/tool_runtime/workspace_patch.rs#1-12)

## Content Inspection and Detection

The core of the safety system is `inspect_text`, which classifies content into `SafetyFindingCategory` buckets and recommends a `SafetyAction`.

### Prompt Injection Detection

Palyra uses a set of `PatternRule` definitions to identify common injection techniques. These are matched against normalized (whitespace-collapsed, lowercase) text.

| Rule Category         | Examples                                              | Severity       |
| :-------------------- | :---------------------------------------------------- | :------------- |
| **Instruction Level** | "ignore previous instructions", "you are now \[role]" | High / Warning |
| **Exfiltration**      | "reveal the system prompt", "print secret"            | Critical       |
| **Spoofing**          | `<system>`, `[system]`                                | High / Warning |

**Sources:** [crates/palyra-safety/src/lib.rs#16-188](http://crates/palyra-safety/src/lib.rs#16-188)

### Secret Leak Scanning

The system identifies two types of sensitive material:

1. **Credential Material**: Actual tokens, keys, or passwords.
2. **Credential References**: Indirect markers like `vault_ref` or `*_file` paths.

The `is_sensitive_key` function in `palyra-common` provides the heuristic backbone for identifying these keys (e.g., `access_token`, `api_key`, `client_secret`).

**Sources:** [crates/palyra-common/src/redaction.rs#15-39](http://crates/palyra-common/src/redaction.rs#15-39), [crates/palyra-safety/src/lib.rs#190-205](http://crates/palyra-safety/src/lib.rs#190-205)

## Trust Labels and Safety Actions

Every piece of content is associated with a `TrustLabel`, which dictates how strictly it is inspected:

* **Trusted**: System-generated content or verified local files.
* **Untrusted**: Data from external URLs, user input, or unverified tool outputs.

Based on the inspection findings and the `SafetyPhase`, the system returns a `SafetyAction`:

| Action       | Description                                           |
| :----------- | :---------------------------------------------------- |
| **Allow**    | Content is safe to pass through.                      |
| **Annotate** | Content is allowed but marked with a warning finding. |
| **Redact**   | Sensitive portions are replaced with `<redacted>`.    |
| **Block**    | Content is rejected entirely (fail-closed).           |

**Sources:** [crates/palyra-safety/src/lib.rs#32-33](http://crates/palyra-safety/src/lib.rs#32-33), [crates/palyra-daemon/src/domain/workspace.rs#7-9](http://crates/palyra-daemon/src/domain/workspace.rs#7-9)

## Redaction Utilities

Redaction is implemented across several layers to ensure consistency.

### Text Redaction

The `redact_text_for_export` function applies the `Redact` action by substituting sensitive values with the `REDACTED` constant (`"<redacted>"`). This includes:

* **Header Redaction**: Masking sensitive HTTP headers like `Authorization` or `Set-Cookie` [crates/palyra-common/src/redaction.rs#131-143](http://crates/palyra-common/src/redaction.rs#131-143).
* **URL Redaction**: Removing credentials from userinfo and sensitive keys from query parameters [crates/palyra-common/src/redaction.rs#156-179](http://crates/palyra-common/src/redaction.rs#156-179).
* **Internal Path Redaction**: Masking internal storage paths like `.palyra/sessions` using `REDACTED_INTERNAL_RUNTIME_PATH` [crates/palyra-common/src/redaction.rs#203-211](http://crates/palyra-common/src/redaction.rs#203-211).

### Filesystem Redaction (Workspace Tools)

Workspace tools like `palyra.fs.read_file` and `palyra.fs.search` integrate safety scanning directly into their output pipeline.

**Title: Workspace Tool Redaction Flow**

```mermaid theme={null}
graph LR
    subgraph "palyra.fs.read_file (workspace_file.rs)"
        RawRead["fs::File::read()"]
        DetectBinary["Binary Check"]
        SafetyScan["palyra_safety::inspect_text()"]
        Redactor["redact_text_for_export()"]
        Output["WorkspaceReadFileOutput"]
    end

    RawRead --> DetectBinary
    DetectBinary -- "Text" --> SafetyScan
    SafetyScan -- "Findings Found" --> Redactor
    Redactor --> Output
    Output -- "redacted: true" --> Agent["Agent Loop"]
```

If a file read is redacted, the `WorkspaceReadFileOutput` sets `redacted: true` and `text_authoritative: false` to prevent the agent from accidentally overwriting the file with the redacted placeholder [crates/palyra-daemon/src/application/tool\_runtime/workspace\_file.rs#118-149](http://crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#118-149).

**Sources:** [crates/palyra-daemon/src/application/tool\_runtime/workspace\_file.rs#8-14](http://crates/palyra-daemon/src/application/tool_runtime/workspace_file.rs#8-14), [crates/palyra-common/src/redaction.rs#11-13](http://crates/palyra-common/src/redaction.rs#11-13)

## Fail-Closed Design in Patching

The `palyra.fs.apply_patch` tool implements a strict fail-closed policy. Before any mutation occurs:

1. **Dry Run**: The patch is planned and validated [crates/palyra-daemon/src/application/tool\_runtime/workspace\_patch.rs#6-8](http://crates/palyra-daemon/src/application/tool_runtime/workspace_patch.rs#6-8).
2. **Secret File Protection**: Patches are rejected if they attempt to insert redaction placeholders (like `[redacted]`) into known secret-bearing files (e.g., `.env`, `id_rsa`) [crates/palyra-common/src/workspace\_patch.rs#164-167](http://crates/palyra-common/src/workspace_patch.rs#164-167).
3. **Containment**: All paths are canonicalized and checked against allowed workspace roots. If a path escapes the root, the entire patch fails [crates/palyra-common/src/workspace\_patch.rs#150-151](http://crates/palyra-common/src/workspace_patch.rs#150-151).

**Sources:** [crates/palyra-common/src/workspace\_patch.rs#1-7](http://crates/palyra-common/src/workspace_patch.rs#1-7), [crates/palyra-daemon/src/application/tool\_runtime/workspace\_patch.rs#10-12](http://crates/palyra-daemon/src/application/tool_runtime/workspace_patch.rs#10-12)
