Understanding the RAG Knowledge Base¶
The AI Assistant in Kamerplanter does not answer from the memory of a general language model — it grounds every response in your own data and a curated knowledge base. This technique is called Retrieval-Augmented Generation (RAG). This page explains how the system is structured and why it works the way it does.
Why RAG?¶
A language model answering solely from its training has two weaknesses:
- Hallucinations — It invents plausible-sounding but incorrect facts
- No context — It does not know your specific plant, your current measurements, or your care history
RAG solves both problems: before generating every response, the system searches a verified database for relevant information and provides it to the model as a foundation. The model then combines these facts with your concrete situation — instead of speculating from memory.
Simply explained
Think of RAG as a very well-prepared assistant: before answering your question, they quickly looked up the relevant reference books. They don't make things up — they explain what they found.
The 4-Level Model¶
Kamerplanter's knowledge base consists of four levels that are combined for every request.
graph TB
subgraph "Level 1: Global Master Data"
E1[Plant species, cultivars, growth phases,<br/>nutrient profiles, pests, diseases]
end
subgraph "Level 2: Thematic Guides"
E2[31 curated expert knowledge files:<br/>Diagnostics, fertilization, irrigation,<br/>environment, phases, outdoor, general]
end
subgraph "Level 3: Tenant Context"
E3[Active planting run, phase,<br/>measurements EC/pH/VPD,<br/>active IPM events, recent feeding events]
end
subgraph "Level 4: Your Plant Data"
E4[Care history, harvest results,<br/>plant diary entries, confirmations]
end
E1 --> RAG[RAG Retriever<br/>pgvector]
E2 --> RAG
E3 --> CB[Context Builder<br/>ArangoDB]
E4 --> CB
RAG --> PA[Prompt Assembler]
CB --> PA
PA --> LLM[Language Model]
LLM --> Response Levels 1 and 2 are stored as vectors and retrieved via similarity search. Levels 3 and 4 are injected as structured text into every request at runtime.
Level 1: Global Master Data¶
The Kamerplanter master data forms the foundation of all recommendations:
- Plant species with taxonomy, care requirements, and characteristics
- Cultivars with specific traits
- Growth phase definitions with VPD targets, light and temperature requirements
- Nutrient profiles per species and phase
- Pest and disease data with symptoms and treatment methods
This data is re-indexed weekly.
Level 2: Thematic Guides¶
Thematic guides contain cross-cutting knowledge that cannot be derived from master data — expert knowledge that applies across many plant species and situations. The knowledge base currently includes 31 curated guides in seven categories:
| Category | Example Guides |
|---|---|
| Diagnostics | Nutrient deficiency symptoms, pH/EC deviations, early pest detection, root health |
| Environment | VPD optimization, light fundamentals, temperature control, CO₂ enrichment |
| Fertilization | EC management (hydroponics/soil), organic outdoor fertilization, CalMag correction, mixing order |
| Irrigation | Irrigation strategies by substrate, recognizing overwatering, water quality |
| Phases | Germination, vegetative optimization, flowering management, harvest timing, overwintering |
| Outdoor | Season planning, companion planting, crop rotation, weather reactions |
| General | Beginner's guide, common mistakes to avoid, yield optimization |
Agrobiologically reviewed
All guides are reviewed for technical accuracy before inclusion in the knowledge base. The system also includes 100 benchmark questions against which every new version of the knowledge base is tested.
Level 3: Tenant Context (Real-Time)¶
For every request, the Context Builder fetches the current state of your grow from the database:
- Active planting runs with current growth phase and phase duration
- Latest measurements: EC, pH, VPD, temperature, humidity
- Active IPM events (pest infestations, diseases, ongoing treatments)
- Last feeding events with quantities and products
Level 4: Your Plant Data (Real-Time)¶
With your consent, personal care data also flows into the context:
- Care confirmations (when watered, fertilized, trained)
- Plant diary entries
- Results from past harvests
An Example Flow¶
Here is how a typical request moves through the system:
Example: Yellowing Leaves
Your question: "My lower leaves are turning yellow — what could be the cause?"
Step 1 — Context Builder reads your data:
Species: Cannabis sativa 'Northern Lights'
Phase: Flowering, Day 21
EC: 1.2 mS/cm, pH: 5.8, VPD: 1.1 kPa
Last feeding: 2 days ago (standard nutrient solution)
Step 2 — RAG Retriever finds relevant chunks: - Master data chunk: "Cannabis sativa, Flowering phase, NPK target 5-15-14" - Guide chunk: "Nitrogen deficiency: lower leaves yellow, raise EC..." - Guide chunk: "Flowering management: N demand drops from week 3, natural N draw-down..."
Step 3 — Model combines everything:
"Your Northern Lights is in week 3 of flowering. Yellowing lower leaves can indicate N deficiency, BUT: natural N draw-down in lower leaves during late flower is normal. Your EC is at 1.2 — that's a bit low for week 3 flowering. Recommendation: raise EC to 1.4–1.6, do NOT increase the N ratio (PK phase). If more than 30% of leaves are affected, consider a light N boost."
Without the thematic guides, the model would only know the master data context and give generic tips. Without your real-time context (EC 1.2, week 3 flower), the model would not know the situation is borderline.
Knowledge Base Quality Assurance¶
Agrobiological Review¶
All guides and master data are reviewed by experienced growers for technical accuracy before inclusion. Particular attention is paid to:
- Correct VPD and EC target values per phase and substrate
- Agreement of symptom descriptions with current literature
- Safety notices (mixing order, pre-harvest intervals)
Benchmark Evaluation¶
The system includes 100 benchmark questions whose answers are automatically evaluated with every knowledge base update:
- Topic Match — Are the retrieved RAG chunks relevant to the question?
- LLM-as-Judge — A second model evaluates factual accuracy and actionability
- A/B Comparison — When models or guide versions change: improvement over baseline?
Adding Custom Guides (Admin)¶
Tenant admins can add custom thematic guides to the local knowledge base. This is useful for:
- Cultivar-specific specialist knowledge
- Internal protocols and operational experience
- Guides in other languages
YAML Format¶
---
title: My Custom Guide Title
category: fertilization # diagnostics | environment | fertilization | irrigation | phases | outdoor | general
tags: [ec, nutrient, hydroponics]
expertise_level: [intermediate, expert]
applicable_phases: [vegetative, flowering]
chunks:
- id: my-first-chunk
title: Section Title
content: |
Knowledge goes here as free text. The content is vectorized
and retrieved for matching queries.
Tip: Concrete, action-oriented text works better
than general descriptions.
metadata:
nutrient: nitrogen
substrate: coco
Uploading a Guide¶
- Open Settings > AI Knowledge Base
- Click Upload Guide
- Select your YAML file
- The system validates the format and shows a preview
- Confirm with Import
The new guide is included in the vector database at the next re-index cycle (daily, 06:00 UTC). You can also trigger a re-index manually.
Quality responsibility
Custom guides are not automatically reviewed. You are responsible for the technical accuracy of your guides. Incorrect guides can degrade the quality of AI responses.
Reindexing the Knowledge Base (Operator/Developer)¶
After modifying knowledge YAML files under spec/knowledge/rag/, the vectors in pgvector must be recomputed. This happens automatically once a week (Sunday 03:00 UTC) but can also be triggered manually.
Prerequisites¶
- Knowledge YAML files are mounted in the container at
/app/knowledge(automatic with Skaffold deployment) - VectorDB (pgvector) and Embedding Service must be running
vectordb_enabled: truein the backend configuration
Workflow: Edit chunk → deploy → reindex → test¶
# 1. Edit knowledge YAML files
# e.g. spec/knowledge/rag/diagnostik/naehrstoffmangel-symptome.yaml
# 2. Redeploy (so the files are available in the container)
skaffold dev # or: skaffold run
# 3. Trigger the Celery reindex task manually
kubectl exec -it deploy/celery-worker -- \
celery -A app.tasks call app.tasks.vector_indexing_tasks.reindex_vector_chunks
# 4. Run the benchmark (optional, recommended)
cd tools/rag-eval
source ~/.venvs/rag-eval/bin/activate
python eval_rag.py
Alternative: Trigger the task via Python interpreter¶
kubectl exec -it deploy/celery-worker -- python -c "
from app.tasks.vector_indexing_tasks import reindex_vector_chunks
result = reindex_vector_chunks.delay()
print(f'Task ID: {result.id}')
"
What happens during reindex?¶
- All YAML files under
/app/knowledgeare read - Each chunk is vectorized using the embedding model (
paraphrase-multilingual-MiniLM-L12-v2, 384 dimensions) - Vectors are upserted into
ai_vector_chunks(existing chunks are updated, new ones added) - The task returns a summary: number of files, number of chunks, duration
Fast feedback loop
For iterative knowledge base improvement, use this cycle:
- Run benchmark → identify failures
- Add or improve chunks in the YAML files
- Deploy and reindex
- Re-run benchmark → verify score improvement
See tools/rag-eval/README.md for benchmark tool details.
Frequently Asked Questions¶
Can the AI search the internet for additional information?
No. The system performs no internet searches. All answers are based exclusively on the local knowledge base (master data, guides) and your own plant data. This is a deliberate design decision to avoid hallucinations and ensure data privacy.
How current are the thematic guides?
Guides are maintained with each Kamerplanter release. The exact status is noted in the version documentation (Changelog). Custom guides you upload remain current until you update or delete them.
What happens if no matching guide chunk is found?
The system falls back to master data (Level 1) and uses the structured context (Levels 3+4). Response quality is lower in this case, but the system still responds — without hallucinating.
Are my custom guides shared with other users?
No. Custom guides are tenant-scoped — they are only visible within your garden/organization and are not shared with the global knowledge base or other tenants.