Pick a
language.
One Kenyan language at a time, built in public, shipped open-weights with the dataset attached. Choose where to start.
Kalenjin
Nilotic · ~5M speakers · Kenya, Tanzania, Uganda, diaspora
Two demos live: a cascade translator and a Whisper transcriber. A pronunciation grader is planned.
- 01/translateEnglish → Kalenjin cascade
- 02/transcribeWhisper LoRA · spoken Kalenjin
- 03/graderPronunciation grader · planned
Luo
Nilotic · ~4M speakers · Kenya, Tanzania, diaspora
Three demos planned. Recipes from Kalenjin will port to Luo once the eval set is in hand.
- 01/translateEnglish → Luo · planned
- 02/transcribeSpoken Luo · planned
- 03/graderPronunciation grader · planned
Read the manifestoHide manifesto
chamgei/labs ships open speech and text models for languages the corpus skipped — starting with Kalenjin (~5M speakers), then Luo, then outward across the broader Nilotic and Bantu families.
Every release is open-weights, evaluated on real held-out audio and prose, and shipped with the dataset that trained it. No benchmarks-on-benchmarks. No closed APIs. The work is intended for diaspora speakers, heritage learners, linguists, and anyone reconnecting with a language the rest of the stack ignored.
A nights-and-weekends tinkering project by Tony Kipkemboi.