Skip to main content
This section covers the lifecycle, storage, and security of non-textual data and workspace-resident documents within Palyra. The system utilizes a dual-storage strategy for media (SQLite for metadata and Filesystem for blobs) and implements a rigorous scanning pipeline for workspace documents to mitigate prompt injection risks.

Media Artifact Store

The MediaArtifactStore is the central authority for managing attachments and generated media. It handles ingestion from external connectors (e.g., Discord), manual uploads via the Console, and derived artifacts produced by the processing pipeline.

Dual Storage Architecture

Media is stored using a hybrid approach to balance query performance with filesystem efficiency:

Media Ingestion Flow

The ChannelPlatform orchestrates the ingestion of attachments received from external channels crates/palyra-daemon/src/channels.rs#106-110.
StageCode EntityDescription
Ingest RequestInboundAttachmentIngestRequestEncapsulates source URL, filename, and expected content type crates/palyra-daemon/src/media.rs#25.
Content SniffingMediaArtifactStore::ingest_inboundDownloads content and validates against MediaRuntimeConfig allowed types crates/palyra-daemon/src/media.rs#49-69.
ValidationnetguardValidates source URLs against allowed hosts (e.g., cdn.discordapp.com) crates/palyra-daemon/src/media.rs#25-26.
PersistenceMediaArtifactPayloadComputes SHA256 and writes to filesystem and SQLite crates/palyra-daemon/src/media.rs#152-161.
Media Storage Entity Mapping Sources: crates/palyra-daemon/src/media.rs#49-161, crates/palyra-daemon/src/channels.rs#106-140

Derived Artifact Pipeline

Palyra automatically processes ingested media to extract useful information for the LLM context. This is managed by the MediaDerivedArtifact logic.

Supported Extractors

The system supports three primary kinds of derived artifacts defined in DerivedArtifactKind crates/palyra-derived.rs#27-31:
  1. MetadataSummary: Basic file info (dimensions, size, hash) crates/palyra-derived.rs#105-137.
  2. ExtractedText: OCR or text extraction from PDF, DOCX, XLSX, and HTML crates/palyra-derived.rs#165-201.
  3. Transcript: Audio-to-text conversion for supported audio formats crates/palyra-derived.rs#156-163.

Processing Logic

Sources: crates/palyra-daemon/src/media_derived.rs#20-201

Workspace Documents and Risk Scanning

Workspace documents are managed within the journal but governed by strict domain rules in crates/palyra-daemon/src/domain/workspace.rs.

Risk States and Prompt Injection

Every document in the workspace undergoes a risk scan to prevent “Indirect Prompt Injection” where a file contains instructions intended to hijack the LLM’s behavior.
Risk StateConstant/PatternAction
CleanNo matchesNormal processing.
WarningPROMPT_INJECTION_WARNING_PATTERNSFlagged in UI; user must confirm use crates/palyra-daemon/src/domain/workspace.rs#20-29.
QuarantinedPROMPT_INJECTION_HIGH_RISK_PATTERNSBlocked from automatic injection into LLM context crates/palyra-daemon/src/domain/workspace.rs#10-19.

Path Security

The system enforces strict path validation to prevent directory traversal and access to sensitive system areas: Workspace Security Pipeline Sources: crates/palyra-daemon/src/domain/workspace.rs#1-178

Prompt Augmentation and Recall

When a user sends a message, the PreparedModelProviderInput logic determines which media and workspace artifacts to include in the LLM prompt.

Attachment Recall

If a user refers to an attachment, AttachmentRecallSelection is used to retrieve the specific chunks or derived text crates/palyra-daemon/src/application/provider_input.rs#100-106.

Vision Pipeline

Images are converted into ProviderImageInput objects. The system enforces:

Context References

The preview_context_references function parses the user input for specific markers (e.g., @file, @url) and resolves them into ResolvedContextReference objects, which include “provenance” (where the data came from) and “preview_text” crates/palyra-daemon/src/application/context_references.rs#48-77. Sources: crates/palyra-daemon/src/application/provider_input.rs#100-156, crates/palyra-daemon/src/application/context_references.rs#40-77