The first proven XENI pipeline โ from 664,000 Bangla news articles to a validated monthly narrative index that correlates with macroeconomic indicators.
BENI is the first end-to-end XENI pipeline โ an instrument that measures economic narratives in the Bangla language by turning news articles into quantitative data.
BENI collects news articles from six major Bangladeshi newspapers, annotates them for economic relevance using an LLM ensemble (Claude, GPT-4o), trains a classifier, and constructs a monthly narrative index.
The index is then validated against real-world macroeconomic indicators โ CPI inflation, exchange rates, and foreign exchange reserves โ proving that narrative measurements capture economically meaningful signal.
Six stages turn raw Bangla news into a validated economic narrative index.
What BENI has achieved so far โ and what it makes possible for every language that follows.
A monthly time series measuring the share of economic news in Bangla-language media.
The BENI Economic Index tracks what proportion of Bangla news articles discuss the economy each month. At its core, it answers a simple question: how much is Bangladesh talking about the economy?
The classifier predicts an "economic probability" for every article in the corpus. Articles are grouped by month, and the index is the proportion with probability above 0.5. The result is a 79-month time series from June 2014 to December 2020.
Mean economic news share: 38.9% โ meaning roughly 2 in 5 Bangla news articles carry economic relevance.
Clone the repo and be up and running in minutes.
Train the baseline classifier and build the 79-month narrative index from scratch.
git clone https://github.com/LilaLABx/LILA-LAB.git
cd LILA-LAB
pip install -e ".[core]"
cd pipelines/BENI/experiment/beni_pilot
python3 train.py --task economic --model-type tfidf
python3 build_index.py
python3 correlate.py
Explore the full annotation pipeline, run LLM annotation, or fine-tune BanglaBERT.
cd pipelines/BENI/annotation
python3 llm_annotate.py --help
python3 run_model_comparison.py --help
# Set up the Discord bot
cd infrastructure/discord-bot
pip install -r requirements.txt
python bot.py
No coding required. Help annotate Bangla articles, design schemas, or validate LLM outputs.
# Read the contribution guide
cat docs/LINGUISTIC_CONTRIBUTION_GUIDE.md
# Join the community
# discord.gg/TrrdKbky
Where BENI has been and where it's going.
TF-IDF baseline trained, 79-month index built, correlations with CPI and FX validated. Proof of concept established.
Statistical Economics of Narrative and Economic Narrative Indices: Systematic Review โ both complete and submitted.
Building the full BENI pipeline paper. Documents the annotation methodology, classifier training, and index construction.
Using the narrative index to nowcast inflation. Scheduled for August 2026.
Extend BENI to Health, Climate, and Education domains. Each domain needs its own annotation schema and validation data.
Full fine-tuning of BanglaBERT on the 70K-article training set (Kaggle GPU). Expected to improve short-run signal and classification accuracy.
BENI is the engine behind a 6-paper research series. Here's where its data and methodology appear.
Multiple ways to contribute โ no coding required for linguistic contributions.
Annotate Bangla articles, design schemas, review LLM outputs. Your language expertise is the core ingredient.
Get started โImprove the classifier, reduce LLM costs, or design better validation approaches.
Collaboration โExtend BENI to Health, Climate, or Education domains. New annotation schema = new index.
Propose a domain โIndependently verify BENI's results using the open-source code and published data.
Replicate โAnalyze BENI's narrative data for policy insights and real-world applications.
Explore data โClone the repository, run the pipeline, or join the community.