Welcome to LILA Lab
Learn, build, and contribute to language intelligence for underserved languages — whether you're an NLP beginner or a system architect.
Your first 5 minutes
A quick tour of the big ideas — no technical background needed.
From text to narrative index
Three ideas that make LILA Lab tick — and the pipeline that connects them.
Text as Data
Newspaper articles aren't just words — they're measurements of what a society pays attention to. LILA Lab builds instruments to turn this text into quantitative signals.
Foundation →Narrative Index
A monthly score tracking how much a topic (like "the economy") appears in news. Like an economic indicator — but built from text instead of numbers.
Measurement →Domain Indices
One pipeline, many indices. BENI starts with Economic. The same instrument can measure Health, Climate, Education — each domain just needs its own annotation schema.
Scale →XENI Framework
A reproducible pipeline pattern for any language: collect news → LLM annotate → train classifier → build index → validate against real-world data.
Blueprint →News Articles 2014–2020
Claude + GPT-4o Ensemble
TF-IDF / BanglaBERT 91.7% Accuracy
79 Months 38.9% Mean
CPI · FX · Reserves r = −0.75
Start where you are
Pick your profile and we'll guide you to the right resources.
NLP Beginner
New to NLP? Start with fundamentals, learn the XENI framework, and understand how LILA Lab works — no coding required.
Researcher
Explore our 6-paper series, methodology, validation results, and collaboration models for academic publishing.
Developer
Set up pipelines, run experiments, deploy infrastructure. Clone the repo and build from source.
Linguistic Contributor
Speak a low-resource language? Help annotate data, design schemas, and build pipelines. Co-authorship guaranteed.
From zero to pipeline creator
Follow this path to go from NLP novice to building your own language pipeline.
NLP Fundamentals
BeginnerLearn what NLP is, how language models work, and why low-resource languages need different approaches. No technical background needed.
LILA Explorer
EasyUnderstand LILA Lab's mission, the XENI framework, and how pipelines turn raw news into narrative indices. Tour the repository structure.
Pipeline User
IntermediateInstall and run the BENI pipeline. Train classifiers, build narrative indices, and validate against macroeconomic indicators using our working Bangla pipeline.
Contributor
IntermediateContribute to LILA Lab. Annotate data, improve classifiers, extend pipelines to new domains, or build infrastructure. Multiple paths for different skills.
Pipeline Creator
AdvancedBuild your own XENI pipeline for a new language or domain. Use the template, adapt annotation schemas, train models, and publish your findings with co-authorship.
The LILA Lab lexicon
Five terms you'll encounter everywhere — visit the full glossary →
The Bangla Exploration & Native-language Intelligence pipeline — the first proven XENI pipeline, built for 265M Bangla speakers.
Full definition →The naming convention for LILA Lab pipelines: [Language initial] + Exploration & Native-language Intelligence.
Full definition →A monthly time series measuring the prevalence of a narrative in a language's media ecosystem, validated against macro indicators.
Full definition →The structured fields and labels used to classify each news article — economic relevance, topic, narrative force, and more.
Full definition →Comparing the narrative index against real economic indicators (CPI, FX, reserves) to prove it captures meaningful signal.
Full definition →Everything you need to build
Curated resources — from academic papers to community channels.
Research Papers
6-paper series on narrative indices, LLM annotation, and low-resource NLP methodology.
Datasets
BENI v1.0 dataset with 664K Bangla articles, annotations, and narrative indices on Zenodo.
Pipeline Framework
Complete XENI framework documentation — annotation, classification, index construction.
Community Discord
Ask questions, find collaborators, and get help from the LILA Lab team and contributors.
GitHub Repository
Browse the source code, submit issues, fork the repo, and contribute to the project.
Research Blog
Updates, findings, and deep dives published on Substack. Subscribe for the latest.
One pipeline, many domains
Every XENI pipeline can produce indices across multiple domains. Here's what exists, what's planned, and what's possible.
| Domain | BENI Bangla |
AENI Assamese |
NENI Nepali |
SENI Sylheti |
CENI Chittagonian |
HENI Hausa |
KIENI Swahili |
More… |
|---|---|---|---|---|---|---|---|---|
| Economic | ✅ Active | 🔜 Planned | 🔜 Planned | 💡 TBD | 💡 TBD | 💡 TBD | 💡 TBD | 💡 TBD |
| Health | 🔜 Planned | — | — | — | — | — | — | 💡 TBD |
| Climate | 🔜 Planned | — | — | — | — | — | — | 💡 TBD |
| Education | 💡 Proposed | — | — | — | — | — | — | 💡 TBD |
Popular documentation
Frequently accessed guides and references.