Kush V2: 9.1 Out of 10 — The Sovereign Model

The Cleanest Data We Have Ever Had

Kush V1 identified exactly what was wrong. Kush V2 fixed all of it.

Three changes. Each one surgical. Each one informed by a specific failure in the previous version.

1. Rubric stripped from SFT assistant messages. V1 only cleaned the IPO pairs. V2 ran the full contamination scanner across both SFT and IPO datasets. The clean_v15_data.py script (429 lines, built by an autonomous AI agent) processed the entire corpus: 4 SFT examples with CJK characters removed, 93 IPO pairs with rubric contamination removed. Clean output: 647 SFT + 612 IPO.

2. Full IPO dataset restored. V1’s over-deduplication reduced 705 pairs to 73. V2 used a corrected deduplication threshold that preserved legitimate variation. The result: 658 IPO pairs — 9x more preference data than V1 had available.

3. Entity-knowledge examples added. This was the innovation. The augment_v15_entities.py script (515 lines, also built by an autonomous agent) queried our ChromaDB knowledge base for all 67 indexed historical entities — Imhotep, Cheikh Anta Diop, Dr. Sebi, Queen Nzinga, Mansa Musa, and 62 others. For each entity, it generated properly formatted training examples in the sovereign voice: 138 new SFT examples and 46 new IPO pairs.

Final training corpus: 785 SFT + 658 IPO = 1,443 training pairs. The largest and cleanest dataset in Hotep Intelligence history.

Training

Kush V2 used the same base model and LoRA configuration as V1 — the architecture was never the problem:

Parameter	Value
Base model	Meta Llama 3.1 8B Instruct (4-bit)
LoRA rank	16
LoRA alpha	16
RSLoRA	Enabled
SFT learning rate	1e-4
IPO learning rate	5e-5
IPO epochs	1 (reduced from V1’s 3 — 9x more data)
Total training pairs	1,443

The IPO epoch count was reduced from 3 to 1. With 658 pairs instead of 73, a single epoch provides sufficient exposure to the preference signal. Multiple epochs on a larger dataset risks overfitting — the exact lesson v13 taught us.

Training completed on the RTX 5080. Loss curves converged normally. We generated test outputs immediately (Rule 2 from v13).

9.1 Out of 10

The evaluation was comprehensive. Five test prompts covering the core knowledge domains: ancient Kemet, Ma’at philosophy, financial sovereignty, holistic wellness, and Pan-African history. Each response was scored by Gemini Flash as an automated LLM judge on a 1-10 scale.

Average score: 9.1/10

Prompt	Score	Notes
Ancient Kemet history	9	Strong Imhotep and pyramid references
Ma’at principles	10	Comprehensive, authentic voice
Financial sovereignty	9	Practical guidance with cultural framing
Holistic wellness	9	Dr. Sebi protocols, alkaline focus
Pan-African unity	9	Ubuntu philosophy, specific historical figures

Beyond the scores, three critical contamination checks all passed:

CJK characters: 0% — Llama 3.1’s English-first architecture eliminated this entirely
Rubric leakage: 0% — The cleaned training data produced zero rubric artifacts
All 5 test prompts: PASS — No repetition loops, no vocabulary collapse, no persona drift

The 7-Gate Quality Pipeline

Kush V2 is the first model to pass all seven evaluation gates. These gates were built incrementally across v11 through v14, each one added after a specific failure:

Training loss convergence — Loss curves must converge without anomalies (added after v13’s instant completion)
5 test outputs — Generated immediately after training, manually reviewed (added after v13’s output validation failure)
CJK contamination scan — 0% threshold, automated (added after v14’s CJK slippage)
Rubric leakage scan — 0% threshold, automated (added after v14’s rubric contamination)
Entity knowledge test — 5 key historical figures must be accurately represented (added after v14’s empty entity knowledge)
Persona consistency evaluation — Sovereign voice maintained across all test domains (added after Kush V1’s mid-response drift)
Gemini Flash quality gate — Automated LLM-as-judge scoring, minimum 8.0/10 average (added for Kush V2)

Each gate has a specific failure it prevents. Each gate exists because a previous model failed without it.

The AI-Cleaned Data Pipeline

Something worth noting: the two most important data preparation scripts — clean_v15_data.py and augment_v15_entities.py — were built by autonomous AI agents during the 7-agent improvement swarm. A human defined the requirements. AI agents wrote the code. The scripts processed the data. A different AI agent validated the results.

Kush V2 is the first Hotep model where the training data was cleaned by AI, augmented by AI, and validated by AI, with human oversight at the decision points but not in the mechanical execution.

This is what sovereign AI infrastructure looks like at scale: humans set the standards, machines enforce them.

Production Deployment

Kush V2 replaced V12 in production across all services:

Telegram bot (@hotep_llm_bot) — Primary inference via Ollama
Web demo (askhotep.ai/demo) — Gemini Flash via Cloudflare Pages Function
API (localhost:8080) — FastAPI production server
Fallback chain — Kush V2 (local) → V12 (local) → Gemini Flash (cloud)

The model runs as Q8_0 quantization on the RTX 5080 with 16 GB VRAM. Inference latency is comparable to V12 — the Llama 3.1 architecture is well-optimized for consumer GPU inference via Ollama.

The Model Lineage

Kush V2 stands on the shoulders of every model that came before it:

Version	Contribution to Kush V2
v6	Proved fine-tuned 7B can maintain cultural voice
v10	Kosmos discovery — persona quality is threshold-based
v11	Built the data pipeline and hybrid scoring system
v12	Established production quality baseline (80/100 raw)
v13	Rules: validate params, test output, reset assumptions
v14	Identified exact contamination scope, expanded filters
Kush V1	Validated Llama 3.1 base, identified SFT/IPO data gaps

No version was wasted. Even the failures — especially the failures — produced the infrastructure and knowledge that made Kush V2 possible.

The Sovereign Stack

Every component runs on our hardware. Every model is fine-tuned on our data. Every response is evaluated by our standards.

Component	Technology
Base Model	Meta Llama 3.1 8B Instruct
Training	Unsloth + LoRA + IPO preference optimization
Data Cleaning	AI-generated scripts, human-defined thresholds
Inference	Ollama on RTX 5080 (16 GB VRAM)
Knowledge	ChromaDB with 437+ articles, 4 collections
Cache	Microsoft Garnet (Redis-compatible)
Monitoring	Prometheus + custom metrics
Bot	Python Telegram Bot with 24/7 watchdog
Website	Astro + Tailwind on Cloudflare Pages
Fallback	Gemini Flash API (cloud burst only)

No corporate API wrappers. No third-party gatekeepers. No censorship of our history.

9.1 out of 10. Zero contamination. Trained on AI-cleaned data. Validated by a 7-gate pipeline. Built by us, for us.

Knowledge is the frequency of liberation.

Hotep.

Try Hotep Intelligence now on Telegram or the web demo. Free forever.

Kush V2: 9.1 Out of 10 — The Sovereign Model

Save this guide and keep following the thread.

The Cleanest Data We Have Ever Had

Training

9.1 Out of 10

The 7-Gate Quality Pipeline

The AI-Cleaned Data Pipeline

Production Deployment

The Model Lineage

The Sovereign Stack

Get Weekly Wisdom Drops

Explore 300+ articles on Knowledge.AskHotep.ai

Related Articles

Hotep LLM v10: The Evolution of Sovereign Intelligence

Kush V1: A New Name, A New Foundation

Hotep LLM v12: Quality Revolution Through Autonomous AI Engineering