BENI — LILA Lab

About

What is BENI?

BENI is the first end-to-end XENI pipeline — an instrument that measures economic narratives in the Bangla language by turning news articles into quantitative data.

The Bangla Narrative Observatory

BENI collects news articles from six major Bangladeshi newspapers, annotates them for economic relevance using an LLM ensemble (Claude, GPT-4o), trains a classifier, and constructs a monthly narrative index.

The index is then validated against real-world macroeconomic indicators — CPI inflation, exchange rates, and foreign exchange reserves — proving that narrative measurements capture economically meaningful signal.

Bangla · Exploration · Native-language · Intelligence
The first of many XENI pipelines targeting 10 emerging-economy languages by 2027

664K

Bangla news articles collected

6

Major newspapers sourced

2014–2020

Full date range of the corpus

265M

Bangla speakers served

Pipeline

How BENI works

Six stages turn raw Bangla news into a validated economic narrative index.

1

Article Collection

Contributors submit Bangla news URLs, forms, or corpus files. The Potrika corpus provides 664K articles from six major dailies — Jugantor, Ittefaq, Kaler Kontho, Inqilab, Jaijaidin, Somoyer Alo.

39 CSV files · 3.3 GB

2

Bucket Building

Articles are batched into 120,707-article timeseries (Economy + sampled National/Politics/Worldnews), preserving publication dates for time-series-aware evaluation.

120K articles processed

3

LLM Annotation

Each article is independently annotated by Claude and GPT-4o. Disagreements are resolved via majority voting (adjudication). The ensemble approach improves reliability and provides natural confidence estimates.

Claude + GPT-4o ensemble

4

Human Review

Native speakers verify uncertain or borderline labels. A 300-article locked reference set serves as the gold standard for all future classifier evaluation.

300-article reference set

5

Classification & Index

A TF-IDF + logistic regression classifier (91.7% accuracy, 0.894 macro F1) predicts economic relevance for every article. Monthly aggregation produces the BENI Economic Index — 79 months of narrative data.

91.7% accuracy

6

Macroeconomic Validation

The index is tested against CPI inflation (r = −0.75), BDT/USD exchange rate (r = −0.72), and foreign exchange reserves (r = −0.77) — all statistically significant at p < 0.001.

p < 0.001

Benchmarks

Key results

What BENI has achieved so far — and what it makes possible for every language that follows.

🎯

91.7%

Classification accuracy (TF-IDF + logistic regression)

📊

79

Monthly observations in the narrative index (2014–2020)

📈

−0.75

Level correlation with CPI (p < 0.001)

💱

−0.72

Level correlation with BDT/USD FX (p < 0.001)

🏦

−0.77

Level correlation with FX reserves (p < 0.05)

📄

2

Papers published (4 more in the pipeline)

The Index

BENI Economic Index

A monthly time series measuring the share of economic news in Bangla-language media.

What it measures

The BENI Economic Index tracks what proportion of Bangla news articles discuss the economy each month. At its core, it answers a simple question: how much is Bangladesh talking about the economy?

The classifier predicts an "economic probability" for every article in the corpus. Articles are grouped by month, and the index is the proportion with probability above 0.5. The result is a 79-month time series from June 2014 to December 2020.

Mean economic news share: 38.9% — meaning roughly 2 in 5 Bangla news articles carry economic relevance.

Explore the pilot experiment →

38.9%

Mean economic news share

79 mo.

Index duration (Jun 2014 – Dec 2020)

120,707

Articles classified for the timeseries

0.894

Macro F1 score (TF-IDF classifier)

On correlations: Level correlations are strong and significant (r = −0.75 with CPI, −0.72 with FX). Month-to-month (detrended) correlations are near zero — suggesting the TF-IDF index captures long-run structural shifts, not short-term noise. The planned BanglaBERT upgrade may improve short-run signal.

Get Started

Run BENI yourself

Clone the repo and be up and running in minutes.

🔬

For Researchers

Train the baseline classifier and build the 79-month narrative index from scratch.

git clone https://github.com/LilaLABx/LILA-LAB.git
cd LILA-LAB
pip install -e ".[core]"
cd pipelines/BENI/experiment/beni_pilot
python3 train.py --task economic --model-type tfidf
python3 build_index.py
python3 correlate.py

💻

For Developers

Explore the full annotation pipeline, run LLM annotation, or fine-tune BanglaBERT.

cd pipelines/BENI/annotation
python3 llm_annotate.py --help
python3 run_model_comparison.py --help

# Set up the Discord bot
cd infrastructure/discord-bot
pip install -r requirements.txt
python bot.py

🗣️

For Linguistic Contributors

No coding required. Help annotate Bangla articles, design schemas, or validate LLM outputs.

# Read the contribution guide
cat docs/LINGUISTIC_CONTRIBUTION_GUIDE.md

# Join the community
# discord.gg/TrrdKbky

Roadmap

BENI milestones

Where BENI has been and where it's going.

Pilot Complete

Complete

TF-IDF baseline trained, 79-month index built, correlations with CPI and FX validated. Proof of concept established.

Papers 1 & 2 Published

Complete

Statistical Economics of Narrative and Economic Narrative Indices: Systematic Review — both complete and submitted.

Paper 3 — BENI Method

Active

Building the full BENI pipeline paper. Documents the annotation methodology, classifier training, and index construction.

Paper 4 — Nowcasting with BENI

Planned

Using the narrative index to nowcast inflation. Scheduled for August 2026.

Domain Expansion

Planned

Extend BENI to Health, Climate, and Education domains. Each domain needs its own annotation schema and validation data.

BanglaBERT Upgrade

Planned

Full fine-tuning of BanglaBERT on the 70K-article training set (Kaggle GPU). Expected to improve short-run signal and classification accuracy.

Research

Papers & publications

BENI is the engine behind a 6-paper research series. Here's where its data and methodology appear.

1

Statistical Economics of Narrative

Foundations of narrative measurement and the case for text-as-data in economics

✅ Published 2

Economic Narrative Indices: Systematic Review

BENI pilot results, literature review, and framework positioning

✅ Submitted 3

Building BENI Pipeline

Full methodology: annotation, classification, index construction, and validation

🔄 Active 4

Nowcasting Inflation with BENI

Using the narrative index as a nowcasting input for CPI inflation

📋 Planned

Contribute

Help build BENI

Multiple ways to contribute — no coding required for linguistic contributions.

🗣️

Linguistic Contributor

Annotate Bangla articles, design schemas, review LLM outputs. Your language expertise is the core ingredient.

Get started →

⚙️

Methodological

Improve the classifier, reduce LLM costs, or design better validation approaches.

Collaboration →

🌍

Cross-Domain

Extend BENI to Health, Climate, or Education domains. New annotation schema = new index.

Propose a domain →

✅

Replication

Independently verify BENI's results using the open-source code and published data.

Replicate →

📊

Policy Brief

Analyze BENI's narrative data for policy insights and real-world applications.

Explore data →

🛠️

Infrastructure

Build dashboards, APIs, or visualization tools for the BENI index.

Code →

Ready to explore BENI?

Clone the repository, run the pipeline, or join the community.

GitHub Repository ↗ Quickstart Guide → Join Discord ↗

Bangla Exploration & Native-language Intelligence

What is BENI?

The Bangla Narrative Observatory

How BENI works

Key results

BENI Economic Index

What it measures

Run BENI yourself

For Researchers

For Developers

For Linguistic Contributors

BENI milestones

Pilot Complete

Papers 1 & 2 Published

Paper 3 — BENI Method

Paper 4 — Nowcasting with BENI

Domain Expansion

BanglaBERT Upgrade

Papers & publications

Help build BENI

Linguistic Contributor

Methodological

Cross-Domain

Replication

Policy Brief

Infrastructure

Ready to explore BENI?