AI Architecture¶

This page describes the technical architecture of the AI Assistant (REQ-031). The implementation follows the adapter pattern from REQ-011 and integrates into the existing 5-layer architecture.

System Architecture¶

graph TB
    subgraph "Frontend (React/MUI)"
        TC[TipCardsPanel]
        CD[AiChatDrawer]
        DM[DiagnosisModePanel]
    end

    subgraph "API Layer (FastAPI)"
        AR["/api/v1/t/slug/ai/tips"]
        AC["/api/v1/t/slug/ai/chat"]
        AD["/api/v1/t/slug/ai/diagnose"]
        APC["/api/v1/t/slug/ai/providers"]
    end

    subgraph "Business Logic"
        AAS[AiAssistantService]
        CB[ContextBuilder]
        RR[RagRetriever]
        PA[PromptAssembler]
        TG[TipGeneratorService]
    end

    subgraph "Provider Adapters"
        OA[OllamaAdapter]
        OAI[OpenAiAdapter]
        ANT[AnthropicAdapter]
        LC[LlamaCppAdapter]
        OC[OpenAiCompatibleAdapter]
        FB[RuleBasedFallback]
    end

    subgraph "Data Layer"
        ARD[(ArangoDB<br/>Master Data + Context)]
        TS[(TimescaleDB<br/>pgvector)]
        RD[(Redis<br/>Cache 4h TTL)]
    end

    subgraph "Background Tasks (Celery)"
        GDT[generate_daily_tips]
        RVC[reindex_vector_chunks]
    end

    TC --> AR
    CD --> AC
    DM --> AD

    AR --> AAS
    AC --> AAS
    AD --> AAS

    AAS --> CB
    AAS --> RR
    AAS --> PA

    CB --> ARD
    RR --> TS
    PA --> OA
    PA --> OAI
    PA --> ANT
    PA --> LC
    PA --> OC
    PA --> FB

    AAS --> RD

    GDT --> AAS
    RVC --> TS
    RVC --> ARD

IAiProvider — Adapter Interface¶

All AI providers implement the IAiProvider interface. New providers can be added without changing existing code (Open/Closed Principle, analogous to ExternalSourceAdapter in REQ-011).

# app/domain/interfaces/ai_provider.py

class IAiProvider(ABC):
    """Abstract interface for AI provider adapters.

    Implementations: OllamaAdapter, OpenAiAdapter,
    AnthropicAdapter, LlamaCppAdapter, OpenAiCompatibleAdapter,
    RuleBasedFallback.
    """

    @abstractmethod
    async def chat(
        self,
        messages: list[ChatMessage],
        *,
        max_tokens: int = 1024,
        temperature: float = 0.3,
    ) -> AiResponse:
        """Full response (for tip cards)."""
        ...

    @abstractmethod
    async def chat_stream(
        self,
        messages: list[ChatMessage],
        *,
        max_tokens: int = 1024,
        temperature: float = 0.3,
    ) -> AsyncIterator[str]:
        """Token-by-token streaming (for chat, SSE)."""
        ...

    @abstractmethod
    async def health_check(self) -> bool:
        """Check reachability and functionality."""
        ...

Provider Registry¶

Providers are resolved via a registry, analogous to the AdapterRegistry pattern in REQ-011:

# app/data_access/ai_providers/registry.py

class AiProviderRegistry:
    _providers: dict[str, type[IAiProvider]] = {}

    @classmethod
    def register(cls, provider_type: str):
        """Decorator for provider registration."""
        def decorator(klass):
            cls._providers[provider_type] = klass
            return klass
        return decorator

    @classmethod
    def resolve(cls, config: AiProviderConfig) -> IAiProvider:
        """Returns an initialized provider instance."""
        klass = cls._providers.get(config.provider_type)
        if klass is None:
            raise ValueError(f"Unknown provider type: {config.provider_type}")
        return klass(config)

RAG Pipeline¶

Embedding Model¶

Model: sentence-transformers/all-MiniLM-L6-v2
Dimensions: 384
Model size: ~23 MB
Operation: Local, no API key, no external service

The embedding model runs as a Python process in the backend container. It generates vectors for: - New or updated master data documents (Celery task reindex_vector_chunks) - Incoming user queries for similarity search

Vector Store (pgvector on TimescaleDB)¶

CREATE TABLE ai_vector_chunks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_type VARCHAR(64) NOT NULL,
    -- 'species' | 'cultivar' | 'growth_phase' | 'care_rule' | 'pest' | 'disease'
    source_key VARCHAR(128) NOT NULL,
    chunk_index INT NOT NULL DEFAULT 0,
    chunk_text TEXT NOT NULL,
    embedding vector(384) NOT NULL,
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- IVFFlat index for cosine similarity search
CREATE INDEX idx_ai_vector_chunks_embedding
    ON ai_vector_chunks USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

TimescaleDB was chosen as the vector store because it is already in the stack (no additional service like Qdrant or Chroma needed).

Chunk Configuration¶

Parameter	Value	Rationale
Chunk size	512 tokens	Optimal balance of precision and context
Chunk overlap	64 tokens	Prevents information loss at boundaries
Top-K retrieval	5 chunks	Balance of context and prompt length
Similarity threshold	0.65	Cosine distance; filters irrelevant chunks

Retrieval Strategy¶

# app/domain/engines/rag_retriever.py

class RagRetriever:
    async def retrieve(
        self,
        query: str,
        *,
        top_k: int = 5,
        source_type_filter: list[str] | None = None,
        metadata_filter: dict | None = None,
    ) -> list[RagChunk]:
        """Cosine similarity search on ai_vector_chunks.

        Args:
            query: User query or context description.
            top_k: Number of chunks to return.
            source_type_filter: Optional restriction to specific
                source types (e.g. ['pest', 'disease'] for diagnosis).
            metadata_filter: Optional JSONB filter (e.g. phase).
        """
        query_embedding = self._embed(query)
        # pgvector cosine similarity: <=>
        # (1 - cosine_distance) >= similarity_threshold
        ...

Re-Ranking Stage (Cross-Encoder)¶

Optional component

Re-ranking is optional. Without a configured RERANKER_URL, the pipeline operates unchanged in hybrid-search-only mode. See ADR-007 for the rationale.

Position in the pipeline¶

Query → Hybrid Search (top_k=20) → Cross-Encoder Re-Rank (top_k=5) → LLM

The re-ranker sits between retrieval and LLM generation. Hybrid Search deliberately retrieves more chunks than are ultimately passed to the LLM (over-retrieval strategy): 20 candidates are retrieved, re-ordered by semantic relevance using the cross-encoder, and only the best 5 reach the LLM context window.

Why a cross-encoder?¶

The Bi-Encoder (E5-base) and BM25 rank independently. Keyword-rich chunks receive a high BM25 score even when they are semantically unrelated to the query. The cross-encoder evaluates each query–chunk pair jointly and produces more precise relevance scores. This reduces the dominant error class GENERATION_MISS (LLM hallucination caused by irrelevant context).

Separate microservice (ONNX Runtime)¶

The re-ranker runs as a standalone reranker-service — analogous to the embedding service:

No PyTorch in the container — only ONNX Runtime and the Hugging Face tokenizer
Multi-stage Dockerfile: model download and ONNX export in a cached build stage; the runtime image remains lean
Port 8081, FastAPI with two endpoints: /rerank (POST) and /health (GET)
Model: BAAI/bge-reranker-v2-m3 — multilingual (DE/EN), 568M parameters, Apache-2.0 licence

sequenceDiagram
    participant KS as Knowledge Service
    participant RE as Reranker Service<br/>(Port 8081)

    KS->>KS: Hybrid Search → 20 candidates
    KS->>RE: POST /rerank<br/>{query, documents[20], top_k: 5}
    RE->>RE: Cross-Encoder inference<br/>ONNX Runtime, ~500ms
    RE-->>KS: {results: [{index, score, text}×5]}
    KS->>KS: Sort chunks by score
    KS->>KS: Build context for LLM

Graceful degradation¶

When RERANKER_URL is empty or not set, RerankerEngine.available returns False. In that case the original chunk list is truncated to top_k entries and passed directly to the LLM context. A timeout or HTTP error from the reranker service also triggers this fallback — with a WARNING log entry (reranker_fallback).

Resource requirements¶

Scenario	RAM	CPU	Latency/query
Reranker active (20→5)	1.5–4 GB	1–2 cores	+~500ms
Reranker disabled	0	0	0ms

First Docker build

The first build of the reranker-service image takes 10–15 minutes because BAAI/bge-reranker-v2-m3 is downloaded and exported to ONNX via optimum. Subsequent builds use the cached layer and complete in seconds.

Context Builder¶

The ContextBuilder fetches the current state of a plant or planting run from ArangoDB at runtime and formats it as structured text for the system prompt.

# app/domain/engines/ai_context_builder.py

class AiContextBuilder:
    async def build_plant_context(
        self,
        tenant_key: str,
        context_key: str,
        context_type: str,
    ) -> PlantContext:
        """Fetches and formats plant context.

        Returns:
            PlantContext with: species, cultivar, current phase,
            phase day, EC/pH/VPD (latest measurement), active IPM events,
            last 3 feeding events, substrate type.
        """
        ...

Fetched data (AQL traversal):

planting_runs → current growth_phase → target EC, pH, VPD
plant_instances → cultivar → species → care profiles
observation_readings (TimescaleDB) → latest measurements
ipm_inspections → active infestations and ongoing treatments
feeding_events → last 3 events with products and quantities

Prompt Assembler¶

The PromptAssembler combines all information into a structured system prompt:

[System Role]
You are a plant advisory assistant for Kamerplanter. You respond
exclusively based on the provided context information.

[Current Plant Context]
Species: Cannabis sativa | Cultivar: Northern Lights
Phase: Flowering (Day 21/56) | Substrate: Coco
EC target: 1.4–1.8 mS/cm | EC actual: 1.2 mS/cm
pH target: 5.8–6.2 | pH actual: 5.8
VPD target: 0.8–1.2 kPa | VPD actual: 1.1 kPa

[Knowledge Base Chunks]
[Chunk 1 — species]: Cannabis sativa Flowering NPK profile...
[Chunk 2 — care_rule/diagnostics/nutrient-deficiency-symptoms#nitrogen-deficiency]:
  Nitrogen deficiency: lower leaves yellow...
[Chunk 3 — care_rule/phases/flowering-management]:
  N demand drops from week 3 of flowering...

[User Experience Level]
intermediate — Show technical details, no code examples.

[Chat History]
(last 5 messages)

[User Query]
My lower leaves are turning yellow — what could be the cause?

Prompt Lengths by Feature¶

Feature	Tokens Input	Tokens Output
Tip cards (JSON)	~800	~200
Chat single query	~1,500	~300
Chat with 10 messages history	~3,000	~400
Diagnosis request	~2,000	~500

Caching Strategy¶

Redis (Hot Cache)¶

Tip cards are cached in Redis with a 4-hour TTL. Cache key schema:

ai:tips:{tenant_key}:{context_type}:{context_key}

Celery Batch Task¶

The daily Celery task generate_daily_tips (06:00 UTC) generates tip cards for all active planting runs in the background and writes them to Redis and ArangoDB (ai_tip_cache collection).

# app/tasks/ai_tasks.py

@celery_app.task(name="generate_daily_tips")
def generate_daily_tips():
    """Generates tip cards for all active planting runs.

    Runs daily at 06:00 UTC. Processes runs sequentially
    for CPU-only inference (max_concurrent_tips=1, configurable).
    """
    ...

Cache Invalidation¶

Tip cards are regenerated immediately when: - Phase transition (phase_transition event) - EC/pH outside tolerance band (±10% from target) - New IPM event recorded

Cloud providers (OpenAI, Anthropic) require explicit GDPR consent (REQ-025). The consent middleware checks for valid consent before every request.

# app/common/dependencies.py

async def require_ai_consent(
    provider_config: AiProviderConfig,
    current_user: User,
    consent_service: ConsentService,
) -> None:
    """Checks GDPR consent for cloud AI providers.

    Raises:
        ConsentRequiredError: If provider.requires_consent == True
            and no valid consent exists.
    """
    if provider_config.requires_consent:
        consent = await consent_service.get_consent(
            user_key=current_user.key,
            purpose="ai_cloud_processing",
        )
        if not consent or not consent.is_valid:
            raise ConsentRequiredError(
                "Cloud AI provider requires GDPR consent.",
                consent_purpose="ai_cloud_processing",
            )

Local providers (ollama, llamacpp) have requires_consent: false and need no consent.

Eval Framework¶

Response quality is evaluated automatically:

Method	Description
Topic Match	Are the RAG chunks semantically relevant to the query? (cosine score > 0.70)
LLM-as-Judge	A second model evaluates factual accuracy and actionability (1–5 points)
Benchmark Suite	100 predefined questions with reference answers; regression test on model changes
A/B Comparison	For new models or guide versions: automatic comparison against baseline

Data Model Overview¶

ArangoDB Collections¶

Collection	Description	Retention
`ai_provider_configs`	Provider configurations per tenant	Permanent
`ai_conversations`	Chat histories with message records	90 days
`ai_tip_cache`	Cached tip cards	7 days

TimescaleDB Tables¶

Table	Description
`ai_vector_chunks`	Vector index (384-dim, all-MiniLM-L6-v2) for RAG

Edge Collections (ArangoDB)¶

Collection	From → To	Purpose
`ai_tip_references_plant`	`ai_tip_cache` → `plant_instances`	Link tip to plant
`ai_tip_references_run`	`ai_tip_cache` → `planting_runs`	Link tip to run
`ai_conversation_about`	`ai_conversations` → `plant_instances / planting_runs`	Conversation context

Deployment Configuration (Helm)¶

# helm/kamerplanter/values.yaml — AI configuration

ollama:
  enabled: true          # Ollama as sidecar or dedicated pod
  controllers:
    main:
      containers:
        main:
          env:
            OLLAMA_MODELS: /models
            OLLAMA_NUM_PARALLEL: "1"
            OLLAMA_MAX_LOADED_MODELS: "1"
          resources:
            requests:
              cpu: 500m
              memory: 2Gi
            limits:
              cpu: "4"
              memory: 6Gi   # For gemma3:4b Q4_K_M

backend:
  env:
    AI_DEFAULT_PROVIDER: ollama           # ollama | openai | anthropic | none
    AI_OLLAMA_BASE_URL: http://ollama:11434
    AI_OLLAMA_MODEL: gemma3:4b
    AI_TIP_CACHE_TTL_HOURS: "4"
    AI_MAX_CONCURRENT_TIPS: "5"           # 1 for CPU-only
    AI_EMBEDDING_MODEL: sentence-transformers/all-MiniLM-L6-v2
    AI_RAG_TOP_K: "5"
    AI_CONVERSATION_RETENTION_DAYS: "90"

References¶

REQ-031 — AI Assistant & Plant Advisory (spec/req/REQ-031_KI-Assistent-Pflanzenberatung.md)
REQ-011 — External Master Data Enrichment (spec/req/REQ-011_Externe-Stammdatenanreicherung.md)
REQ-025 — Privacy & GDPR (spec/req/REQ-025_Datenschutz-Betroffenenrechte.md)
ADR-006 — Embedding Model E5-base and Hybrid Search
ADR-007 — Cross-Encoder Re-Ranking for RAG Pipeline
Understanding the RAG Knowledge Base
AI Assistant
pgvector Documentation
BAAI/bge-reranker-v2-m3
sentence-transformers/all-MiniLM-L6-v2