লা मी ने পা ه و Vi س লী ng

Welcome to LILA Lab

Learn, build, and contribute to language intelligence for underserved languages — whether you're an NLP beginner or a system architect.

Open Source Text as Data 10 Languages Target Community-Driven

Your first 5 minutes

A quick tour of the big ideas — no technical background needed.

From text to narrative index

Three ideas that make LILA Lab tick — and the pipeline that connects them.

Pipeline → Index Flow
📰 664K Bangla
News Articles
2014–2020
🤖 LLM Annotation
Claude + GPT-4o
Ensemble
🧠 Classifier
TF-IDF / BanglaBERT
91.7% Accuracy
📊 Monthly Index
79 Months
38.9% Mean
Macro Validation
CPI · FX · Reserves
r = −0.75

Start where you are

Pick your profile and we'll guide you to the right resources.

From zero to pipeline creator

Follow this path to go from NLP novice to building your own language pipeline.

1

NLP Fundamentals

Beginner

Learn what NLP is, how language models work, and why low-resource languages need different approaches. No technical background needed.

⏱️ ~2 hours No prerequisites
2

LILA Explorer

Easy

Understand LILA Lab's mission, the XENI framework, and how pipelines turn raw news into narrative indices. Tour the repository structure.

⏱️ ~1 hour No prerequisites
3

Pipeline User

Intermediate

Install and run the BENI pipeline. Train classifiers, build narrative indices, and validate against macroeconomic indicators using our working Bangla pipeline.

⏱️ ~4 hours Python basics
4

Contributor

Intermediate

Contribute to LILA Lab. Annotate data, improve classifiers, extend pipelines to new domains, or build infrastructure. Multiple paths for different skills.

⏱️ Ongoing Varies by path
5

Pipeline Creator

Advanced

Build your own XENI pipeline for a new language or domain. Use the template, adapt annotation schemas, train models, and publish your findings with co-authorship.

⏱️ Weeks–months NLP + Python

The LILA Lab lexicon

Five terms you'll encounter everywhere — visit the full glossary →

Everything you need to build

Curated resources — from academic papers to community channels.

One pipeline, many domains

Every XENI pipeline can produce indices across multiple domains. Here's what exists, what's planned, and what's possible.

Domain BENI
Bangla
AENI
Assamese
NENI
Nepali
SENI
Sylheti
CENI
Chittagonian
HENI
Hausa
KIENI
Swahili
More…
Economic ✅ Active 🔜 Planned 🔜 Planned 💡 TBD 💡 TBD 💡 TBD 💡 TBD 💡 TBD
Health 🔜 Planned 💡 TBD
Climate 🔜 Planned 💡 TBD
Education 💡 Proposed 💡 TBD
Active — index published Planned — pipeline in development Proposed — awaiting contributors Not started