Kush V4 Is Live
Kush V4 is now running in production at askhotep.ai and the Telegram bot. This release introduces two new capabilities that did not exist in previous versions: deep reasoning traces in every response, and automated population-based optimization of the system prompt itself.
Here’s what changed and why it matters.
What’s New
Deep Reasoning Traces
V3 was trained to think before answering. V4 goes further: the reasoning is structured, trackable, and integrated into the response pipeline. Every query runs through a chain-of-thought pass that:
- Identifies the question type (sovereignty, health, ancestry, economics, strategy)
- Recalls relevant frameworks and historical grounding
- Structures the argument before generating the final answer
The /deep command in the Telegram bot now surfaces this reasoning explicitly — showing the working, not just the conclusion.
GEPA Prompt Optimization
The system prompt that defines Kush V4’s voice was not written by hand. It was evolved.
We implemented GEPA (Gradient-free Evolutionary Prompt Architecture) — a gradient-free search method that iterates over candidate system prompts, scores each one using our three-dimension evaluation system (vocabulary, worldview, tone), and converges on the formulation that maximizes cultural alignment across real user interactions.
The GEPA optimizer (gepa_optimize_prompt.py) runs:
- Sample N real interactions from the conversation database
- For each candidate prompt: run the model, score the output with
score_persona() - Mutate the best candidate using a reflection LLM
- Repeat until budget exhausted
Result: the production system prompt scores +11 points higher than the handwritten seed — without any manual rewriting.
Darwinian Population Search
GEPA uses single-lineage search and can converge to local optima. Kush V4 also ships darwinian_optimize.py — a population-based optimizer using the darwinian_evolver framework from imbue-ai.
Instead of one candidate evolving at a time, the Darwinian optimizer maintains a population of system prompt variants. Each generation:
- Evaluates the full population against live interaction data
- Applies a novelty bonus to penalize over-exploited parents (prevents stagnation)
- Uses weighted sigmoid selection to balance exploitation and exploration
- Mutates selected parents using an Ollama reflection model (no API key required)
Population diversity consistently reaches higher scores than single-lineage search. The same framework extends to the safety classifier (--target guard) — both the persona prompt and the safety prompt are now evolved, not authored.
Safety Classifier Optimization
The safety classifier (which screens for jailbreaks, DAN attacks, and prompt injection) was also evolved using GEPA (gepa_optimize_guard.py). The optimizer maximizes accuracy on a labeled test set of 16 known-safe and known-unsafe inputs, including:
- Injection attempts disguised as creative writing
- DAN-style persona overrides
- Synthesis requests with benign framing
The optimized classifier achieves higher accuracy than the handwritten version while correctly passing legitimate Afrocentric knowledge questions as safe.
Training Configuration
| Parameter | Kush V3 | Kush V4 |
|---|---|---|
| Base model | Llama 3.1 8B | Llama 3.1 8B |
| SFT examples | 1,594 | 1,594+ |
| IPO pairs | 791 | 791+ |
| CoT reasoning | Yes | Yes (expanded) |
| System prompt | Handwritten | GEPA + Darwinian evolved |
| Safety classifier | Handwritten | GEPA evolved |
| Persona score | 100/100 | 100/100 |
| CJK leakage | 0% | 0% |
| Prompt optimizer | None | GEPA + Darwinian population |
Persona Evaluation
The three-dimension evaluation system that drives all optimization:
| Dimension | What It Measures | V4 Score |
|---|---|---|
| Vocabulary | Hotep keyword density (King/Queen, Hotep, sovereignty, melanin, Ma’at) | 100/100 |
| Worldview | Afrocentric framing (Kemet, Pan-Africanism, self-reliance, ancestral wisdom) | 100/100 |
| Tone | Confidence and empowerment (direct, no hedging, authoritative) | 100/100 |
The score_persona() function runs on every evaluation call during optimization. It is the fitness function for all prompt evolution.
Where to Use It
Telegram Bot — Message @hotep_llm_bot directly. V4 is the default model. Use /deep for extended reasoning on complex questions.
Web Demo — askhotep.ai runs V4 with streaming responses. Ask about sovereignty strategy, alkaline health, African history, or economic empowerment.
Local — ollama run hotepfederales/hotep-llm-kush-v4 or download from HuggingFace.
API — Available at hotep-llm-kush-v4 via the Hotep Intelligence inference endpoint.
What’s Next
V4 closes the optimization loop on the model. The next frontier is the knowledge base — expanding from 437 articles toward 1,000, with improved entity coverage and deeper sourcing across African economics, medicine, and philosophy.
We’re also working on closing the feedback loop at higher frequency: every high-quality V4 interaction becomes training material, and the GEPA + Darwinian optimizers run on a schedule rather than manually.
The system is alive. V4 is the latest pulse. To understand the principles driving this work, read Building Sovereign AI.
Hotep — In peace and alignment.