Kalenjin · Translate · Cascade · NLLB-200 + LoRA

English in.
Kalenjin out.

A two-leg neural cascade. NLLB-200 carries the English to Swahili, then a fine-tuned LoRA carries the Swahili to Kalenjin. Type a sentence — or speak it.

58.79chrF++ · locked eval

~2sWarm latency

600MNLLB-200 params

Console — Idle

T4 GPU · Modal

Input — Englisheng_Latn

⏎ to translate

Try one

§ 01 — The cascade

How it works.

Read the cascade explainer

Kalenjin parallel data is scarce. Direct English-to-Kalenjin fine-tunes underperform — the encoder never sees enough pairs to learn the mapping. We cascade through Swahili instead, where there's thousands of times more training data.

EN → SW

NLLB-200 base translates the English to Swahili. No adapter, no fine-tune — just the off-the-shelf model handling a high-resource pair.

SW → KL

The same base with our LoRA adapter attached translates the Swahili to Kalenjin, decoding into luo_Latn — a western-Nilotic token we hijack since NLLB ships no Kalenjin code.

On a locked 250-row eval set this cascade scores chrF++ 58.79, with strong generalization on long literary prose.

Open weights · hf.co/Tonykip/swahili-kalenjin-mt-lora

English in.Kalenjin out.

English in.
Kalenjin out.