Newspaper articles aren't just words — they're measurements of what a society pays attention to. LILA Lab builds instruments to turn this text into quantitative signals.

Foundation →

📈

Narrative Index

A monthly score tracking how much a topic (like "the economy") appears in news. Like an economic indicator — but built from text instead of numbers.

Measurement →

🎯

Domain Indices

One pipeline, many indices. BENI starts with Economic. The same instrument can measure Health, Climate, Education — each domain just needs its own annotation schema.

Scale →

⚙️

XENI Framework

A reproducible pipeline pattern for any language: collect news → LLM annotate → train classifier → build index → validate against real-world data.

Blueprint →

Pipeline → Index Flow

📰 664K Bangla
News Articles 2014–2020

→

🤖 LLM Annotation
Claude + GPT-4o Ensemble

→

🧠 Classifier
TF-IDF / BanglaBERT 91.7% Accuracy

→

📊 Monthly Index
79 Months 38.9% Mean

→

✅ Macro Validation
CPI · FX · Reserves r = −0.75

Choose Your Path

Start where you are

Pick your profile and we'll guide you to the right resources.

🌱

NLP Beginner

New to NLP? Start with fundamentals, learn the XENI framework, and understand how LILA Lab works — no coding required.

No experience needed →

🔬

Researcher

Explore our 6-paper series, methodology, validation results, and collaboration models for academic publishing.

Papers & methodology →

💻

Developer

Set up pipelines, run experiments, deploy infrastructure. Clone the repo and build from source.

Python • CLI • API →

🗣️

Linguistic Contributor

Speak a low-resource language? Help annotate data, design schemas, and build pipelines. Co-authorship guaranteed.

No coding required →

Learning Roadmap

From zero to pipeline creator

Follow this path to go from NLP novice to building your own language pipeline.

NLP Fundamentals

Beginner

Learn what NLP is, how language models work, and why low-resource languages need different approaches. No technical background needed.

⏱️ ~2 hours No prerequisites

Start with FAQ → Pipeline Flow →

LILA Explorer

Easy

Understand LILA Lab's mission, the XENI framework, and how pipelines turn raw news into narrative indices. Tour the repository structure.

⏱️ ~1 hour No prerequisites

About LILA Lab → Project Roadmap →

Pipeline User

Intermediate

Install and run the BENI pipeline. Train classifiers, build narrative indices, and validate against macroeconomic indicators using our working Bangla pipeline.

⏱️ ~4 hours Python basics

Quick Start → BENI Deep Dive → Pilot Experiment →

Contributor

Intermediate

Contribute to LILA Lab. Annotate data, improve classifiers, extend pipelines to new domains, or build infrastructure. Multiple paths for different skills.

⏱️ Ongoing Varies by path

Collaboration Framework → Linguistic Guide → Code Contribution →

Pipeline Creator

Advanced

Build your own XENI pipeline for a new language or domain. Use the template, adapt annotation schemas, train models, and publish your findings with co-authorship.

⏱️ Weeks–months NLP + Python

Template Pipeline → XENI Framework →

Key Concepts

The LILA Lab lexicon

Five terms you'll encounter everywhere — visit the full glossary →

BENI

The Bangla Exploration & Native-language Intelligence pipeline — the first proven XENI pipeline, built for 265M Bangla speakers.

Full definition →

XENI

The naming convention for LILA Lab pipelines: [Language initial] + Exploration & Native-language Intelligence.

Full definition →

Narrative Index

A monthly time series measuring the prevalence of a narrative in a language's media ecosystem, validated against macro indicators.

Full definition →

Annotation Schema

The structured fields and labels used to classify each news article — economic relevance, topic, narrative force, and more.

Full definition →

Macro Validation

Comparing the narrative index against real economic indicators (CPI, FX, reserves) to prove it captures meaningful signal.

Full definition →

Knowledge & Resources

Everything you need to build

Curated resources — from academic papers to community channels.

📄

Research Papers

6-paper series on narrative indices, LLM annotation, and low-resource NLP methodology.

💾

Datasets

BENI v1.0 dataset with 664K Bangla articles, annotations, and narrative indices on Zenodo.

⚙️

Pipeline Framework

Complete XENI framework documentation — annotation, classification, index construction.

💬

Community Discord

Ask questions, find collaborators, and get help from the LILA Lab team and contributors.

⌨️

GitHub Repository

Browse the source code, submit issues, fork the repo, and contribute to the project.

📰

Research Blog

Updates, findings, and deep dives published on Substack. Subscribe for the latest.

Index of Domains

One pipeline, many domains

Every XENI pipeline can produce indices across multiple domains. Here's what exists, what's planned, and what's possible.

Domain	BENI Bangla	AENI Assamese	NENI Nepali	SENI Sylheti	CENI Chittagonian	HENI Hausa	KIENI Swahili	More…
Economic	✅ Active	🔜 Planned	🔜 Planned	💡 TBD	💡 TBD	💡 TBD	💡 TBD	💡 TBD
Health	🔜 Planned	—	—	—	—	—	—	💡 TBD
Climate	🔜 Planned	—	—	—	—	—	—	💡 TBD
Education	💡 Proposed	—	—	—	—	—	—	💡 TBD

Active — index published Planned — pipeline in development Proposed — awaiting contributors Not started

Quick Links

Documentation

Welcome to LILA Lab

Your first 5 minutes

From text to narrative index

Text as Data

Narrative Index

Domain Indices

XENI Framework

Start where you are

NLP Beginner

Researcher

Developer

Linguistic Contributor

From zero to pipeline creator

NLP Fundamentals

LILA Explorer

Pipeline User

Contributor

Pipeline Creator

The LILA Lab lexicon

Everything you need to build

Research Papers

Datasets

Pipeline Framework

Community Discord

GitHub Repository

Research Blog

One pipeline, many domains

Popular documentation