SYSTEM MONITOR — v2.0.0

Research
Control Room

Real-time monitoring of LILA Lab's language intelligence pipelines.

$ ./monitor --pipelines --live

Monitor Data
📊
0
Articles Analyzed
Bangla news corpus
🎯
0
Classification Accuracy
TF-IDF baseline
📈
0
Months Indexed
2014–2020
🔗
0
CPI Correlation
p < 0.001

Breakthrough Results

Validated evidence that narrative indices work for low-resource languages

Validation

TF-IDF Outperforms Deep Learning

Simple TF-IDF + Logistic Regression achieves 91.7% accuracy, outperforming complex deep learning models while requiring no GPU and costing $0.02/article in annotation.

91.7% Accuracy
$0.02 Per Article
Scale

664K Articles Processed

Complete processing of 664,000+ Bangla news articles from the Potrika corpus, creating the largest low-resource language narrative dataset for economic analysis.

664K+ Articles
3.3 GB Corpus Size
Cost

LLM Annotation at Scale

Claude and GPT-4o ensemble achieves human-level annotation quality at $0.02–0.03 per article, making large-scale annotation feasible for low-resource languages.

3,200 Labels Created
$0.03 Max Cost

How the Pipeline Works

Every language gets its own XENI — proven in Bangla as BENI, ready for yours.

01
📰

Collect Native News

Aggregate millions of native-language articles from local sources, preserving linguistic authenticity from day one.

02
🤖

Annotate & Classify

LLM ensemble (Claude, GPT-4o) annotates narratives across domains. Multi-model classification achieves 91.7% accuracy.

03
📊

Build Validated Index

Construct monthly narrative indices validated against macroeconomic indicators. CPI correlation at r = −0.75.


Interactive Research Data

Live charts from the BENI pipeline analysis

BENI Index vs CPI (2014–2020)

Inverse correlation between narrative index and inflation

Classification Model Comparison

TF-IDF baseline vs. deep learning models

Monthly Article Volume

Consistent data collection across 79 months

Annotation Cost Breakdown

Cost per article by annotation method


XENI Pipeline Status

Language pipeline development progress across all targets

BENI

Active

Bangla (বাংলা)

265M speakers

100% — Complete
  • ✓ 664K articles processed
  • ✓ 91.7% accuracy
  • ✓ 79-month index
  • ✓ CPI validated (r = −0.75)

AENI

Contributors Needed

Assamese (অসমীয়া)

15M speakers

5% — Seeking contributors
  • ○ Pipeline code reusable
  • ○ Annotation schema ready
  • ○ Awaiting data collection
  • ○ Extension template available

NENI

Contributors Needed

Nepali (नेपाली)

25M speakers

5% — Seeking contributors
  • ○ Pipeline code reusable
  • ○ Annotation schema ready
  • ○ Awaiting data collection
  • ○ Extension template available

SENI

Contributors Needed

Sylheti (ꠍꠤꠟꠐꠤ)

11M speakers

3% — Planned
  • ○ Pipeline code reusable
  • ○ Annotation schema ready
  • ○ Awaiting data collection
  • ○ Extension template available

Latest from the Lab

Recent findings, publications, and community announcements

2026-06-08 Pipeline

BENI Pipeline Reaches 91.7% Accuracy

After optimizing the TF-IDF + Logistic Regression pipeline, we've achieved 91.7% classification accuracy on gold-standard human annotations — without any GPU.

2026-06-05 Community

Calling Assamese, Nepali & Sylheti Speakers

We're seeking native speakers to help build the next XENI pipelines. Each contributor receives co-authorship on the resulting paper. No coding required.

2026-06-01 Publication

Systematic Review Submitted to arXiv

Our systematic review of 20 years of economic narrative indices has been submitted to arXiv. The paper replicates and extends the full ENI methodology literature.

2026-05-28 Data

LILA-BENI v1.0 Dataset Released

The complete BENI dataset with 664K articles, annotations, and indices is now available on Zenodo with a permanent DOI for reproducible research.

2026-05-20 Method

LLM Annotation Costs: Claude vs GPT-4o

We compare annotation costs and quality between Claude and GPT-4o for Bangla economic relevance labeling. Claude achieves comparable quality at lower cost.


Contribute to LILA Lab

Join our research collective and help build NLP infrastructure for underserved languages.