Memory & RAG System - Palyra.com

The Memory and Retrieval-Augmented Generation (RAG) subsystem provides the daemon with long-term persistence and context retrieval capabilities. It manages “Memory Items” (vector-embedded snippets), “Workspace Documents” (curated knowledge), and “Learning Reflections” (automated background processing of session history into structured facts).

System Architecture

The memory system is integrated into the JournalStore and orchestrated by the palyrad gateway. It bridges the gap between raw session events and high-level agent knowledge.

Memory Data Flow

The following diagram illustrates how natural language input from a session is transformed into stored memory and subsequently retrieved for RAG. Memory Ingestion and Retrieval Pipeline Sources: crates/palyra-daemon/src/journal.rs#64-100, crates/palyra-daemon/src/gateway.rs#116-121, crates/palyra-common/src/daemon_config_schema.rs#193-196

Memory Items & Embedding Providers

Memory items are the atomic units of the RAG system. Each item consists of text content, a vector embedding, and metadata (source, tags, and TTL).

Embedding Implementation

The system supports pluggable embedding providers via the MemoryEmbeddingProvider trait crates/palyra-daemon/src/journal.rs#64-68.

Hash Provider: A deterministic HashMemoryEmbeddingProvider is used by default for local-only, low-latency scenarios, producing 64-dimensional vectors crates/palyra-daemon/src/journal.rs#71-80.
External Providers: The system can be configured to use LLM-backed embeddings (e.g., OpenAI text-embedding-3-small) via the FileModelProviderConfig crates/palyra-common/src/daemon_config_schema.rs#112-112.

Constraints & Constants

Constant	Value	Description
`MAX_MEMORY_ITEM_BYTES`	16 KB	Maximum size of a single memory text snippet crates/palyra-daemon/src/gateway.rs#118-118.
`MAX_MEMORY_ITEM_TOKENS`	2,048	Token limit for embedding generation crates/palyra-daemon/src/gateway.rs#119-119.
`MAX_MEMORY_SEARCH_TOP_K`	64	Maximum number of hits returned per search crates/palyra-daemon/src/gateway.rs#117-117.

Sources: crates/palyra-daemon/src/gateway.rs#116-121, crates/palyra-daemon/src/journal.rs#50-55

Auto-Inject & Recall Preview

The “Auto-Inject” mechanism automatically searches memory during the orchestrator’s run loop to provide relevant context to the agent without explicit user intervention.

Recall Preview: Before a message is sent to the LLM, the useRecallPreview hook in the web console fetches a preview of what the memory system would inject based on the current composer state apps/web/src/chat/useRecallPreview.ts#1-10.
Injection Logic: Controlled by FileMemoryAutoInjectConfig, the daemon performs a vector search and prepends the top results to the system prompt crates/palyra-common/src/daemon_config_schema.rs#193-196.

Sources: apps/web/src/chat/useRecallPreview.ts#1-10, crates/palyra-common/src/daemon_config_schema.rs#193-196

Retention & Maintenance

The memory system enforces strict TTL (Time-To-Live) and storage quotas to prevent unbounded database growth.

Retention Policy: Configured via MemoryRetentionPolicy, allowing limits on total entries, total bytes, or age in days crates/palyra-daemon/src/gateway.rs#65-65.
Maintenance Loop: The MemoryMaintenanceRequest triggers background tasks including:
- Vector Backfill: Generating embeddings for items added without vectors crates/palyra-daemon/src/gateway.rs#64-64.
- TTL Enforcement: Deleting expired items based on MEMORY_RETENTION_DAY_MS crates/palyra-daemon/src/journal.rs#56-56.
- Vacuuming: Reclaiming SQLite disk space crates/palyra-common/src/daemon_config_schema.rs#204-204.

Sources: crates/palyra-daemon/src/journal.rs#50-57, crates/palyra-daemon/src/gateway.rs#64-66, crates/palyra-common/src/daemon_config_schema.rs#200-205

Memory CLI & Web Interface

Users can manage the memory state through both the palyra CLI and the Web Console.

CLI Commands

The memory command family allows for manual searching and maintenance:

palyra memory search <query>: Performs a vector search.
palyra memory purge: Clears items based on filters (session ID, channel, or “all”).

Web Console (Memory Section)

The MemorySection in the React app provides a visual management surface for:

Workspace Documents: Editing curated knowledge files apps/web/src/console/sections/MemorySection.tsx#115-122.
Learning Queue: Reviewing “Reflections” generated by the background learning runtime apps/web/src/console/sections/MemorySection.tsx#158-166.
Recall Testing: A “Search all sources” tool to debug retrieval performance apps/web/src/console/sections/MemorySection.tsx#89-91.

Memory Management Components Sources: apps/web/src/console/sections/MemorySection.tsx#27-94, crates/palyra-daemon/src/journal.rs#63-71, apps/web/src/console/useConsoleAppState.tsx#165-165

​System Architecture

​Memory Data Flow

​Memory Items & Embedding Providers

​Embedding Implementation

​Constraints & Constants

​Auto-Inject & Recall Preview

​Retention & Maintenance

​Memory CLI & Web Interface

​CLI Commands

​Web Console (Memory Section)