Kalenjin · Translate · Cascade · NLLB-200 + LoRA
English in.
Kalenjin out.
A two-leg neural cascade. NLLB-200 carries the English to Swahili, then a fine-tuned LoRA carries the Swahili to Kalenjin. Type a sentence — or speak it.
58.79chrF++ · locked eval
~2sWarm latency
600MNLLB-200 params
Console — Idle
T4 GPU · ModalInput — Englisheng_Latn
⏎ to translate
Try one
§ 01 — The cascade
How it works.
Read the cascade explainerHide explainer
Kalenjin parallel data is scarce. Direct English-to-Kalenjin fine-tunes underperform — the encoder never sees enough pairs to learn the mapping. We cascade through Swahili instead, where there's thousands of times more training data.
1
EN → SW
NLLB-200 base translates the English to Swahili. No adapter, no fine-tune — just the off-the-shelf model handling a high-resource pair.
2
SW → KL
The same base with our LoRA adapter attached translates the Swahili to Kalenjin, decoding into
luo_Latn — a western-Nilotic token we hijack since NLLB ships no Kalenjin code.On a locked 250-row eval set this cascade scores chrF++ 58.79, with strong generalization on long literary prose.
Open weights · hf.co/Tonykip/swahili-kalenjin-mt-lora