> ## Documentation Index
> Fetch the complete documentation index at: https://docs-code.palyra.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Retrieval-Augmented Generation (RAG) and Recall

<details>
  <summary>Relevant source files</summary>

  The following files were used as context for generating this wiki page:

  * crates/palyra-cli/src/commands/memory\_external\_index.rs
  * crates/palyra-common/src/process\_runner\_input.rs
  * crates/palyra-daemon/src/application/instruction\_compiler.rs
  * crates/palyra-daemon/src/application/memory.rs
  * crates/palyra-daemon/src/application/mod.rs
  * crates/palyra-daemon/src/application/recall.rs
  * crates/palyra-daemon/src/application/service\_authorization.rs
  * crates/palyra-daemon/src/application/tool\_registry/builtin.rs
  * crates/palyra-daemon/src/application/tool\_registry/tests.rs
  * crates/palyra-daemon/src/application/tool\_runtime/memory.rs
  * crates/palyra-daemon/src/gateway/runtime.rs
  * crates/palyra-daemon/src/gateway/runtime/external\_retrieval.rs
  * crates/palyra-daemon/src/gateway/tests.rs
  * crates/palyra-daemon/src/journal.rs
  * crates/palyra-daemon/src/journal/retrieval\_index\_status.rs
  * crates/palyra-daemon/src/lib.rs
  * crates/palyra-daemon/src/provider\_leases.rs
  * crates/palyra-daemon/src/retrieval.rs
  * crates/palyra-daemon/src/retrieval/external\_index.rs
  * crates/palyra-daemon/src/sandbox\_runner.rs
  * crates/palyra-daemon/src/transport/grpc/services/memory/service.rs
  * crates/palyra-daemon/src/transport/http/handlers/console/diagnostics.rs
  * crates/palyra-daemon/src/transport/http/handlers/console/memory\_external\_index.rs
  * crates/palyra-daemon/src/transport/http/handlers/console/usage.rs
  * crates/palyra-daemon/src/transport/http/router.rs
  * crates/palyra-daemon/src/usage\_governance.rs
  * crates/palyra-daemon/tests/golden/current\_state\_inventory.json
  * schemas/proto/palyra/v1/memory.proto
</details>

The Palyra Retrieval-Augmented Generation (RAG) system provides agents with a high-fidelity mechanism to access durable principal memory, session-specific context, and workspace-scoped documents. It employs a hybrid retrieval strategy that combines lexical search, vector embeddings, and temporal recency to produce a ranked "Recall Plan". This plan is then compiled into the model's instruction set or injected into the turn context via the `InstructionCompiler`.

## Retrieval Architecture and Hybrid Search

Palyra utilizes a multi-stage retrieval pipeline to resolve relevant context from the `JournalStore`. The system performs hybrid search by merging results from SQLite FTS5 (lexical) and vector similarity (semantic) indices [crates/palyra-daemon/src/journal.rs#135-143](http://crates/palyra-daemon/src/journal.rs#135-143).

### Scoring Components

The final relevance score for a retrieval candidate is calculated using `score_with_profile` [crates/palyra-daemon/src/gateway/runtime.rs#107-110](http://crates/palyra-daemon/src/gateway/runtime.rs#107-110), which aggregates:

1. **Lexical Score**: BM25-based matching via SQLite FTS5 virtual tables [crates/palyra-daemon/src/journal.rs#138-140](http://crates/palyra-daemon/src/journal.rs#138-140).
2. **Vector Score**: Cosine similarity against embeddings produced by a `MemoryEmbeddingProvider` [crates/palyra-daemon/src/journal.rs#168-175](http://crates/palyra-daemon/src/journal.rs#168-175).
3. **Recency Score**: A decay function that favors newer information based on `created_at_unix_ms` [crates/palyra-daemon/src/retrieval.rs#106-107](http://crates/palyra-daemon/src/retrieval.rs#106-107).

### Retrieval Flow Diagram

This diagram illustrates the flow from a natural language query to the assembly of a `RecallPlan` within the `GatewayRuntimeState`.

**Natural Language to Retrieval Plan Flow**

```mermaid theme={null}
graph TD
    subgraph "Natural Language Space"
        UserQuery["User Query / Agent Intent"]
    end

    subgraph "Code Entity Space: GatewayRuntimeState"
        direction TB
        RECALL["execute_memory_recall_tool"]
        RP["RecallRequest"]
        JS["JournalStore"]
        FTS["FTS5 Lexical Search"]
        VEC["Vector Similarity Scan"]
        SCORE["score_with_profile"]
    end

    UserQuery --> RECALL
    RECALL --> RP [crates/palyra-daemon/src/application/tool_runtime/memory.rs:48-48]
    RP --> JS
    JS --> FTS [crates/palyra-daemon/src/journal.rs:138-140]
    JS --> VEC [crates/palyra-daemon/src/journal.rs:137-137]
    FTS --> SCORE
    VEC --> SCORE [crates/palyra-daemon/src/gateway/runtime.rs:107-110]
    SCORE --> PLAN["RecallPlan / ToolExecutionOutcome"]
```

Sources: [crates/palyra-daemon/src/gateway/runtime.rs#105-110](http://crates/palyra-daemon/src/gateway/runtime.rs#105-110), [crates/palyra-daemon/src/journal.rs#135-143](http://crates/palyra-daemon/src/journal.rs#135-143), [crates/palyra-daemon/src/application/tool\_runtime/memory.rs#1-10](http://crates/palyra-daemon/src/application/tool_runtime/memory.rs#1-10)

## Recall Plan and External Indexing

The `RecallPlan` is the structured output of the retrieval process. It includes `RetrievalBranchDiagnostics` to provide transparency into why specific items were selected or filtered [crates/palyra-daemon/src/journal.rs#81-82](http://crates/palyra-daemon/src/journal.rs#81-82).

### External Retrieval Index

For large-scale workspace documents, Palyra supports an external retrieval index. This is managed via the `ExternalRetrievalRuntime` [crates/palyra-daemon/src/retrieval/external\_index.rs]().

* **Chunking**: Documents are split into segments based on `WORKSPACE_CHUNK_TARGET_BYTES` (default 1024) with a `WORKSPACE_CHUNK_OVERLAP_BYTES` (default 160) [crates/palyra-daemon/src/journal.rs#156-157](http://crates/palyra-daemon/src/journal.rs#156-157).
* **Status Tracking**: The `WorkspaceRetrievalIndexStatus` tracks the health and backfill progress of the vector index [crates/palyra-daemon/src/journal/retrieval\_index\_status.rs#73-73](http://crates/palyra-daemon/src/journal/retrieval_index_status.rs#73-73).

### Memory Scopes

Retrieval is strictly partitioned by `MemoryLifecycleScope` [crates/palyra-daemon/src/application/memory.rs#42-42](http://crates/palyra-daemon/src/application/memory.rs#42-42):

* `principal`: Global user-level facts and preferences.
* `session`: Context specific to the current conversation.
* `channel`: Platform-specific context (e.g., Discord server info).
* `workspace`/`project`: Filesystem-backed knowledge [crates/palyra-daemon/src/application/tool\_runtime/memory.rs#19-21](http://crates/palyra-daemon/src/application/tool_runtime/memory.rs#19-21).

Sources: [crates/palyra-daemon/src/journal.rs#153-157](http://crates/palyra-daemon/src/journal.rs#153-157), [crates/palyra-daemon/src/application/tool\_runtime/memory.rs#18-22](http://crates/palyra-daemon/src/application/tool_runtime/memory.rs#18-22), [crates/palyra-daemon/src/gateway/runtime.rs#80-82](http://crates/palyra-daemon/src/gateway/runtime.rs#80-82)

## Instruction Compilation and Context Assembly

Once retrieval is complete, the `InstructionCompiler` assembles the final prompt context. This process ensures that retrieved information is presented to the model with appropriate "Trust Labels" and "Claim Boundaries" [crates/palyra-daemon/src/application/instruction\_compiler.rs#1-10](http://crates/palyra-daemon/src/application/instruction_compiler.rs#1-10).

### Instruction Selection Logic

The `InstructionCompiler` generates `CompiledInstructions` based on:

1. **Tool Catalog**: Available tools for the turn [crates/palyra-daemon/src/application/instruction\_compiler.rs#61-61](http://crates/palyra-daemon/src/application/instruction_compiler.rs#61-61).
2. **Trust Summary**: Aggregated safety posture of the retrieved context blocks [crates/palyra-daemon/src/application/instruction\_compiler.rs#31-38](http://crates/palyra-daemon/src/application/instruction_compiler.rs#31-38).
3. **Temporal Evidence**: Current UTC/Unix timestamps to prevent model hallucination of dates [crates/palyra-daemon/src/application/instruction\_compiler.rs#149-150](http://crates/palyra-daemon/src/application/instruction_compiler.rs#149-150).

### Context Assembly Diagram

This diagram shows how retrieved memory items are transformed into provider messages.

**Recall to Instruction Compiler Pipeline**

```mermaid theme={null}
graph LR
    subgraph "Data Store"
        MEM["MemoryItemRecord"]
    end

    subgraph "Context Engine"
        direction TB
        RAG["RecallPlan"]
        IC["InstructionCompiler"]
        INPUT["InstructionCompilerInput"]
        SEG["CompiledInstructionSegment"]
    end

    subgraph "Model Provider Space"
        PM["ProviderMessage"]
    end

    MEM --> RAG
    RAG --> INPUT [crates/palyra-daemon/src/application/instruction_compiler.rs:57-64]
    INPUT --> IC
    IC --> SEG [crates/palyra-daemon/src/application/instruction_compiler.rs:69-74]
    SEG --> PM [crates/palyra-daemon/src/application/instruction_compiler.rs:92-104]
```

Sources: [crates/palyra-daemon/src/application/instruction\_compiler.rs#1-26](http://crates/palyra-daemon/src/application/instruction_compiler.rs#1-26), [crates/palyra-daemon/src/application/instruction\_compiler.rs#89-104](http://crates/palyra-daemon/src/application/instruction_compiler.rs#89-104), [crates/palyra-daemon/src/application/tool\_runtime/memory.rs#106-121](http://crates/palyra-daemon/src/application/tool_runtime/memory.rs#106-121)

## Key Implementation Details

### Claim Boundaries

To prevent the model from making false assertions about missing information, Palyra uses "Claim Boundaries" [crates/palyra-daemon/src/application/tool\_runtime/memory.rs#12-16](http://crates/palyra-daemon/src/application/tool_runtime/memory.rs#12-16). For example:

* `MEMORY_HITS_ABSENT_CLAIM_BOUNDARY`: "no memory hits were returned; do not invent stored preferences or prior facts" [crates/palyra-daemon/src/application/tool\_runtime/memory.rs#85-86](http://crates/palyra-daemon/src/application/tool_runtime/memory.rs#85-86).
* `SESSION_SEARCH_HITS_PRESENT_CLAIM_BOUNDARY`: Instructs the model to cite hits as session recall, not durable memory [crates/palyra-daemon/src/application/tool\_runtime/memory.rs#98-98](http://crates/palyra-daemon/src/application/tool_runtime/memory.rs#98-98).

### Memory Maintenance

The system performs background maintenance via the `spawn_scheduler_loop` [crates/palyra-daemon/src/lib.rs#143-143](http://crates/palyra-daemon/src/lib.rs#143-143). This includes:

* **Embedding Backfill**: Processing items missing vectors [crates/palyra-daemon/src/journal.rs#77-78](http://crates/palyra-daemon/src/journal.rs#77-78).
* **Retention Enforcement**: Purging expired memory items based on `MEMORY_RETENTION_DAY_MS` [crates/palyra-daemon/src/journal.rs#150-150](http://crates/palyra-daemon/src/journal.rs#150-150).

### Tool Registry Integration

The `palyra.memory.search` and `palyra.memory.recall` tools are defined in the builtin tool registry with specific schemas for `top_k`, `min_score`, and `scope` [crates/palyra-daemon/src/application/tool\_registry/builtin.rs#105-124](http://crates/palyra-daemon/src/application/tool_registry/builtin.rs#105-124).

| Tool Name              | Purpose                                         | Parallelism Policy |
| :--------------------- | :---------------------------------------------- | :----------------- |
| `palyra.memory.search` | Low-level search across lifecycle and workspace | `ReadOnly`         |
| `palyra.memory.recall` | High-level RAG with automatic context injection | `ReadOnly`         |
| `palyra.memory.retain` | Durable storage of new facts/preferences        | `Idempotent`       |

Sources: [crates/palyra-daemon/src/application/tool\_registry/builtin.rs#105-135](http://crates/palyra-daemon/src/application/tool_registry/builtin.rs#105-135), [crates/palyra-daemon/src/application/tool\_runtime/memory.rs#1-22](http://crates/palyra-daemon/src/application/tool_runtime/memory.rs#1-22), [crates/palyra-daemon/src/journal.rs#147-152](http://crates/palyra-daemon/src/journal.rs#147-152)
