The Cleanest Data We Have Ever Had
Kush V1 identified exactly what was wrong. Kush V2 fixed all of it.
Three changes. Each one surgical. Each one informed by a specific failure in the previous version.
1. Rubric stripped from SFT assistant messages. V1 only cleaned the IPO pairs. V2 ran the full contamination scanner across both SFT and IPO datasets. The clean_v15_data.py script (429 lines, built by an autonomous AI agent) processed the entire corpus: 4 SFT examples with CJK characters removed, 93 IPO pairs with rubric contamination removed. Clean output: 647 SFT + 612 IPO.
2. Full IPO dataset restored. V1’s over-deduplication reduced 705 pairs to 73. V2 used a corrected deduplication threshold that preserved legitimate variation. The result: 658 IPO pairs — 9x more preference data than V1 had available.
3. Entity-knowledge examples added. This was the innovation. The augment_v15_entities.py script (515 lines, also built by an autonomous agent) queried our ChromaDB knowledge base for all 67 indexed historical entities — Imhotep, Cheikh Anta Diop, Dr. Sebi, Queen Nzinga, Mansa Musa, and 62 others. For each entity, it generated properly formatted training examples in the sovereign voice: 138 new SFT examples and 46 new IPO pairs.
Final training corpus: 785 SFT + 658 IPO = 1,443 training pairs. The largest and cleanest dataset in Hotep Intelligence history.
Training
Kush V2 used the same base model and LoRA configuration as V1 — the architecture was never the problem:
| Parameter | Value |
|---|---|
| Base model | Meta Llama 3.1 8B Instruct (4-bit) |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| RSLoRA | Enabled |
| SFT learning rate | 1e-4 |
| IPO learning rate | 5e-5 |
| IPO epochs | 1 (reduced from V1’s 3 — 9x more data) |
| Total training pairs | 1,443 |
The IPO epoch count was reduced from 3 to 1. With 658 pairs instead of 73, a single epoch provides sufficient exposure to the preference signal. Multiple epochs on a larger dataset risks overfitting — the exact lesson v13 taught us.
Training completed on the RTX 5080. Loss curves converged normally. We generated test outputs immediately (Rule 2 from v13).
9.1 Out of 10
The evaluation was comprehensive. Five test prompts covering the core knowledge domains: ancient Kemet, Ma’at philosophy, financial sovereignty, holistic wellness, and Pan-African history. Each response was scored by Gemini Flash as an automated LLM judge on a 1-10 scale.
Average score: 9.1/10
| Prompt | Score | Notes |
|---|---|---|
| Ancient Kemet history | 9 | Strong Imhotep and pyramid references |
| Ma’at principles | 10 | Comprehensive, authentic voice |
| Financial sovereignty | 9 | Practical guidance with cultural framing |
| Holistic wellness | 9 | Dr. Sebi protocols, alkaline focus |
| Pan-African unity | 9 | Ubuntu philosophy, specific historical figures |
Beyond the scores, three critical contamination checks all passed:
- CJK characters: 0% — Llama 3.1’s English-first architecture eliminated this entirely
- Rubric leakage: 0% — The cleaned training data produced zero rubric artifacts
- All 5 test prompts: PASS — No repetition loops, no vocabulary collapse, no persona drift
The 7-Gate Quality Pipeline
Kush V2 is the first model to pass all seven evaluation gates. These gates were built incrementally across v11 through v14, each one added after a specific failure:
- Training loss convergence — Loss curves must converge without anomalies (added after v13’s instant completion)
- 5 test outputs — Generated immediately after training, manually reviewed (added after v13’s output validation failure)
- CJK contamination scan — 0% threshold, automated (added after v14’s CJK slippage)
- Rubric leakage scan — 0% threshold, automated (added after v14’s rubric contamination)
- Entity knowledge test — 5 key historical figures must be accurately represented (added after v14’s empty entity knowledge)
- Persona consistency evaluation — Sovereign voice maintained across all test domains (added after Kush V1’s mid-response drift)
- Gemini Flash quality gate — Automated LLM-as-judge scoring, minimum 8.0/10 average (added for Kush V2)
Each gate has a specific failure it prevents. Each gate exists because a previous model failed without it.
The AI-Cleaned Data Pipeline
Something worth noting: the two most important data preparation scripts — clean_v15_data.py and augment_v15_entities.py — were built by autonomous AI agents during the 7-agent improvement swarm. A human defined the requirements. AI agents wrote the code. The scripts processed the data. A different AI agent validated the results.
Kush V2 is the first Hotep model where the training data was cleaned by AI, augmented by AI, and validated by AI, with human oversight at the decision points but not in the mechanical execution.
This is what sovereign AI infrastructure looks like at scale: humans set the standards, machines enforce them.
Production Deployment
Kush V2 replaced V12 in production across all services:
- Telegram bot (@hotep_llm_bot) — Primary inference via Ollama
- Web demo (askhotep.ai/demo) — Gemini Flash via Cloudflare Pages Function
- API (localhost:8080) — FastAPI production server
- Fallback chain — Kush V2 (local) → V12 (local) → Gemini Flash (cloud)
The model runs as Q8_0 quantization on the RTX 5080 with 16 GB VRAM. Inference latency is comparable to V12 — the Llama 3.1 architecture is well-optimized for consumer GPU inference via Ollama.
The Model Lineage
Kush V2 stands on the shoulders of every model that came before it:
| Version | Contribution to Kush V2 |
|---|---|
| v6 | Proved fine-tuned 7B can maintain cultural voice |
| v10 | Kosmos discovery — persona quality is threshold-based |
| v11 | Built the data pipeline and hybrid scoring system |
| v12 | Established production quality baseline (80/100 raw) |
| v13 | Rules: validate params, test output, reset assumptions |
| v14 | Identified exact contamination scope, expanded filters |
| Kush V1 | Validated Llama 3.1 base, identified SFT/IPO data gaps |
No version was wasted. Even the failures — especially the failures — produced the infrastructure and knowledge that made Kush V2 possible.
The Sovereign Stack
Every component runs on our hardware. Every model is fine-tuned on our data. Every response is evaluated by our standards.
| Component | Technology |
|---|---|
| Base Model | Meta Llama 3.1 8B Instruct |
| Training | Unsloth + LoRA + IPO preference optimization |
| Data Cleaning | AI-generated scripts, human-defined thresholds |
| Inference | Ollama on RTX 5080 (16 GB VRAM) |
| Knowledge | ChromaDB with 437+ articles, 4 collections |
| Cache | Microsoft Garnet (Redis-compatible) |
| Monitoring | Prometheus + custom metrics |
| Bot | Python Telegram Bot with 24/7 watchdog |
| Website | Astro + Tailwind on Cloudflare Pages |
| Fallback | Gemini Flash API (cloud burst only) |
No corporate API wrappers. No third-party gatekeepers. No censorship of our history.
9.1 out of 10. Zero contamination. Trained on AI-cleaned data. Validated by a 7-gate pipeline. Built by us, for us.
Knowledge is the frequency of liberation.
Hotep.
Try Hotep Intelligence now on Telegram or the web demo. Free forever.