Go to the 📥 Ingest tab and drag your files onto the drop zone.
📄 PDF — research papers, reports, manuals
📝 DOCX — Word documents, contracts
📖 EPUB — eBooks, digital publications
📃 TXT — plain text, logs, notes
know3 automatically extracts text, removes junk, and splits it into optimized chunks for AI processing. You can review and filter chunks before generating.
Step 5: Generate & Query
know3 offers two powerful modes:
⚡ Training Data Generation
Select a domain (Generic, Coding, Scientific, Legal, Business, Literature, Research). Click Generate to create instruction/output pairs. Each pair teaches a different aspect of your content.
💬 RAG Conversation
In the Tools tab, chat with your documents. Ask questions and get AI-generated answers with source citations. Supports multi-turn follow-up conversations.
Step 6: Export & Fine-Tune
Download your training data in the format your framework needs:
JSONL — HuggingFace, Ollama, LLaMA (recommended)
JSON — Custom Python pipelines
Alpaca — Stanford Alpaca format
ShareGPT — Chat models, DPO training
CSV — Spreadsheets, data analysis
🎯 Use these to fine-tune any LLM — turning a generic model into a domain expert on YOUR content.
Step 1 of 7
Help & FAQ
❓ What is know3?
know3 is a local training data generation engine. Upload documents, and it creates instruction/output pairs perfect for fine-tuning language models.
❓ Do I need internet?
No! Everything runs locally. Your documents never leave your machine. No cloud APIs, no subscriptions.
❓ What file formats are supported?
PDF, DOCX, EPUB, and TXT files. Maximum 1GB per document.
❓ How long does generation take?
Depends on document size and your LLM model. A typical 100-page book takes 5-15 minutes.
❓ What can I do with the pairs?
Use them to fine-tune any language model (Llama, Mistral, Phi, GPT, etc.) on HuggingFace, Ollama, or your own infrastructure.
🔧 Cannot connect to Ollama
Make sure Ollama is running: ollama serve
Also check CORS is enabled (see Setup Guide in the Ingest tab)
🔧 Models not showing
Pull the models first:
ollama pull llama3.2:3b
ollama pull nomic-embed-text
🔧 Generation is very slow
Try a smaller LLM (3B is faster than 7B). Or reduce "Pairs Per Chunk" to 1. Use the Review tab to limit chunks.
🔧 Low quality pairs
Make sure you selected the right domain for your content. Review chunks first (they affect output quality). Try 3 pairs per chunk for more depth.
Generic
Books, articles, general knowledge
Coding
Code docs, API references, tutorials
Scientific
Math, physics, chemistry textbooks
Literature
Novels, essays, humanities texts
Legal
Contracts, statutes, legal documents
Business
Case studies, financial reports
Research Papers
Academic papers, methodology-focused
📄 JSONL (Recommended)
One pair per line. Use with: HuggingFace, Ollama, LLaMA
📄 JSON
Array format. Use with: Custom pipelines, Python scripts
📄 Alpaca
Standard Alpaca format. Use with: Alpaca fine-tuning
💬 ShareGPT
Conversation format. Use with: Chat models, DPO training
📊 CSV
Spreadsheet format. Use with: Excel, data analysis, sheets
🧠
know3
Local RAG + Training Data Engine
v3.1Open SourceMIT License
Created by
KD
Dr. Khaled Diab
PhD Electrical Power Engineering · CEM · NEBOSH IGC Energy & Sustainability Expert · AI/ML Researcher 14+ years in power systems, carbon management & digital transformation
What is know3?
know3 transforms your documents into production-ready training data for fine-tuning large language models. Everything runs 100% locally on your machine using Ollama — no cloud APIs, no subscriptions, no data ever leaves your device. Upload your PDFs, DOCX, EPUB or TXT files, and know3 generates high-quality instruction/output pairs optimized for domain-specific LLM fine-tuning.