An independent research labEst. 2026Kenya / USA

Pick a
language.

One Kenyan language at a time, built in public, shipped open-weights with the dataset attached. Choose where to start.

ActiveVOL.01
/kalenjin

Kalenjin

Nilotic · ~5M speakers · Kenya, Tanzania, Uganda, diaspora

Two demos live: a cascade translator and a Whisper transcriber. A pronunciation grader is planned.

3 open demosEnter /kalenjin
QueuedVOL.02
/luo

Luo

Nilotic · ~4M speakers · Kenya, Tanzania, diaspora

Three demos planned. Recipes from Kalenjin will port to Luo once the eval set is in hand.

  • 01/translateEnglish → Luo · planned
  • 02/transcribeSpoken Luo · planned
  • 03/graderPronunciation grader · planned
Coming soon
+ TBD · Kikuyu · Kamba · Luhya · Kisii · Maasai — added as recipes generalise
§ 01 — Why this lab
A statement of intent.
Read the manifesto

chamgei/labs ships open speech and text models for languages the corpus skipped — starting with Kalenjin (~5M speakers), then Luo, then outward across the broader Nilotic and Bantu families.

Every release is open-weights, evaluated on real held-out audio and prose, and shipped with the dataset that trained it. No benchmarks-on-benchmarks. No closed APIs. The work is intended for diaspora speakers, heritage learners, linguists, and anyone reconnecting with a language the rest of the stack ignored.

§ 02 — Author

A nights-and-weekends tinkering project by Tony Kipkemboi.