System Architecture
The memory system is integrated into theJournalStore and orchestrated by the palyrad gateway. It bridges the gap between raw session events and high-level agent knowledge.
Memory Data Flow
The following diagram illustrates how natural language input from a session is transformed into stored memory and subsequently retrieved for RAG. Memory Ingestion and Retrieval Pipeline Sources: crates/palyra-daemon/src/journal.rs#64-100, crates/palyra-daemon/src/gateway.rs#116-121, crates/palyra-common/src/daemon_config_schema.rs#193-196Memory Items & Embedding Providers
Memory items are the atomic units of the RAG system. Each item consists of text content, a vector embedding, and metadata (source, tags, and TTL).Embedding Implementation
The system supports pluggable embedding providers via theMemoryEmbeddingProvider trait crates/palyra-daemon/src/journal.rs#64-68.
- Hash Provider: A deterministic
HashMemoryEmbeddingProvideris used by default for local-only, low-latency scenarios, producing 64-dimensional vectors crates/palyra-daemon/src/journal.rs#71-80. - External Providers: The system can be configured to use LLM-backed embeddings (e.g., OpenAI
text-embedding-3-small) via theFileModelProviderConfigcrates/palyra-common/src/daemon_config_schema.rs#112-112.
Constraints & Constants
| Constant | Value | Description |
|---|---|---|
MAX_MEMORY_ITEM_BYTES | 16 KB | Maximum size of a single memory text snippet crates/palyra-daemon/src/gateway.rs#118-118. |
MAX_MEMORY_ITEM_TOKENS | 2,048 | Token limit for embedding generation crates/palyra-daemon/src/gateway.rs#119-119. |
MAX_MEMORY_SEARCH_TOP_K | 64 | Maximum number of hits returned per search crates/palyra-daemon/src/gateway.rs#117-117. |
Auto-Inject & Recall Preview
The “Auto-Inject” mechanism automatically searches memory during the orchestrator’s run loop to provide relevant context to the agent without explicit user intervention.- Recall Preview: Before a message is sent to the LLM, the
useRecallPreviewhook in the web console fetches a preview of what the memory system would inject based on the current composer state apps/web/src/chat/useRecallPreview.ts#1-10. - Injection Logic: Controlled by
FileMemoryAutoInjectConfig, the daemon performs a vector search and prepends the top results to the system prompt crates/palyra-common/src/daemon_config_schema.rs#193-196.
Retention & Maintenance
The memory system enforces strict TTL (Time-To-Live) and storage quotas to prevent unbounded database growth.- Retention Policy: Configured via
MemoryRetentionPolicy, allowing limits on total entries, total bytes, or age in days crates/palyra-daemon/src/gateway.rs#65-65. - Maintenance Loop: The
MemoryMaintenanceRequesttriggers background tasks including:- Vector Backfill: Generating embeddings for items added without vectors crates/palyra-daemon/src/gateway.rs#64-64.
- TTL Enforcement: Deleting expired items based on
MEMORY_RETENTION_DAY_MScrates/palyra-daemon/src/journal.rs#56-56. - Vacuuming: Reclaiming SQLite disk space crates/palyra-common/src/daemon_config_schema.rs#204-204.
Memory CLI & Web Interface
Users can manage the memory state through both thepalyra CLI and the Web Console.
CLI Commands
The memory command family allows for manual searching and maintenance:palyra memory search <query>: Performs a vector search.palyra memory purge: Clears items based on filters (session ID, channel, or “all”).
Web Console (Memory Section)
TheMemorySection in the React app provides a visual management surface for:
- Workspace Documents: Editing curated knowledge files apps/web/src/console/sections/MemorySection.tsx#115-122.
- Learning Queue: Reviewing “Reflections” generated by the background learning runtime apps/web/src/console/sections/MemorySection.tsx#158-166.
- Recall Testing: A “Search all sources” tool to debug retrieval performance apps/web/src/console/sections/MemorySection.tsx#89-91.