เฆฌ เฆฒเฆพ เฆ… เฆฎเง€ เคจเฅ‡ เคชเคพ ๊  ๊ ค ู‡ ูˆ แป‡ Vi ุณ เฆฒเง€ ng
๐Ÿชถ Living Script Lab

Language Intelligence
for Low-resource Applications เฆฌเฆพเฆ‚เฆฒเฆพ ยท เฆ…เฆธเฆฎเง€เฆฏเฆผเคพ ยท เคจเฅ‡เคชเคพเคฒเฅ€

Your language. Your stories. Amplified by AI.

A collaborative platform building AI infrastructure for underserved languages.

Pipeline BENI Bangla narrative index from news to economics โ†’ Dashboard Control Room Real-time pipeline metrics, charts, and data โ†’ Documentation Knowledge Base Guides, references, and architecture decisions โ†’ Research Technical Reports 6-paper series on narrative measurement โ†’ Contribute Get Involved Eight ways to contribute โ€” all lead to authorship โ†’ Onboarding Join the Lab Knowledge pillars, roadmap, and first steps โ†’
Scroll

Languages We Serve

Every language deserves to be visible in the data that shapes global decisions. We're building infrastructure for ten underserved languages โ€” and counting.

เฆฌเฆพเฆ‚เฆฒเฆพ
Bangla
265M speakers ยท BENI pipeline active
Active
เฆ…เฆธเฆฎเง€เฆฏเฆผเฆพ
Assamese
15M speakers
Seeking Contributors
เคจเฅ‡เคชเคพเคฒเฅ€
Nepali
25M speakers
Seeking Contributors
๊ ๊ ค๊ Ÿ๊ ๊ ค
Sylheti
11M speakers
Seeking Contributors
เฆšเฆพเฆเฆŸเฆ—เฆพเฆเฆ‡เฆฏเฆผเฆพ
Chittagonian
16M speakers
Seeking Contributors
ู‡ูŽูˆูุณูŽ
Hausa
80M speakers
Planned
Kiswahili
Swahili
100M speakers
Planned
Tiแบฟng Viแป‡t
Vietnamese
100M speakers
Planned
Tagalog
Filipino
80M speakers
Planned
Bahasa Indonesia
Indonesian
200M speakers
Planned
๐Ÿชถ
Your Language?
We're ready when you are
Be First
Start your pipeline โ†’

The Lab in Numbers

Proven in Bangla โ€” scaling to ten languages by 2027

0
Languages targeted
0
Articles processed & classified
0
Classification accuracy (TF-IDF)
0
Research papers in series
0
CPI correlation (p < 0.001, validated)

How XENI Works

Every language gets its own XENI โ€” [X]ploration & E Native-language Intelligence. The first letter changes per language. Proven in Bangla as BENI.

1
๐Ÿ“ฐ

Collect Native News

Aggregate millions of native-language articles from local sources, archives, and feeds โ€” preserving linguistic authenticity from day one.

2
๐Ÿค–

Annotate & Classify

LLM ensemble (Claude, GPT-4o) annotates narratives across domains. Multi-model classification achieves 91.7% accuracy on economic narratives.

3
๐Ÿ“Š

Build Validated Index

Construct monthly narrative indices validated against macroeconomic indicators. BENI's Economic Index correlates with CPI at r=โˆ’0.75 (p < 0.001).

The XENI Naming System

The pattern teaches itself. First letter = language. XENI = pipeline. Domain = index.

BENI = Bangla
AENI = Assamese
NENI = Nepali
YENI = Yoruba
?ENI Your language

Your language's initial + ENI = your pipeline โ†’

The XENI suffix always refers to the pipeline. The domain is a plain-English qualifier on the index it produces โ€” e.g., BENI Economic Index, BENI Health Index, BENI Climate Index. Same pipeline, many indices per language.

The XENI Pipeline Family

Each pipeline is named [Language initial] + ENI. Proven in Bangla, ready for your language.

A
AENI
เฆ…เฆธเฆฎเง€เฆฏเฆผเฆพ โ€” Assamese
Seeking Contributors
N
NENI
เคจเฅ‡เคชเคพเคฒเฅ€ โ€” Nepali
Seeking Contributors
S
SENI
๊ ๊ ค๊ Ÿ๊ ๊ ค โ€” Sylheti
Seeking Contributors
C
CENI
เฆšเฆพเฆเฆŸเฆ—เฆพเฆเฆ‡เฆฏเฆผเฆพ โ€” Chittagonian
Seeking Contributors
H
HENI
ู‡ูŽูˆูุณูŽ โ€” Hausa
Planned
KI
KIENI
Kiswahili โ€” Swahili
Planned
VI
VIENI
Tiแบฟng Viแป‡t โ€” Vietnamese
Planned
TI
TIENI
Tagalog โ€” Filipino
Planned
ID
IDENI
Bahasa Indonesia
Planned
๐Ÿชถ

[?]ENI โ€” Your Language Here

The pipeline is language-agnostic. If you speak it, we can process it. Fork the repo, adapt the template, publish the paper.

Start Your XENI โ†’

LILA Technical Reports

A 6-paper research program on narrative measurement across underserved languages โ€” from statistical foundations to LLM-based measurement devices.

#1

Statistical Economics of Narrative

A quantitative framework for narrative-based economic analysis. Foundations of the methodology.

Complete
#2

Systematic Review of Economic Narrative Indices

Systematic review, replication study, and Bangla extension (2007โ€“2025).

#3

Building BENI: A Replicable Pipeline

From raw news to validated narrative index โ€” the complete methodology and technical architecture.

Active (Jul 2026)
#4

Nowcasting Inflation with BENI

Local-language news as a high-frequency economic indicator for inflation prediction.

Planned (Aug 2026)
#5

Text as Data in Social Science

110-year survey of language-based methods: from content analysis to LLMs.

Planned (Oct 2026)
#6

LLMs as Measurement Devices

Framework for narrative extraction and measurement in low-resource languages.

Proposed (Jan 2027)

Eight Ways to Contribute

Every contribution model leads to academic authorship. If you speak an underserved language, you are not a data source โ€” you are a co-author.

๐ŸŒ

Language Extension

Apply the pipeline to YOUR language. First-author paper.

๐Ÿ”ฌ

Cross-Domain

Apply to health, climate, education. First-author paper.

โš™๏ธ

Methodological

Improve the classifier, reduce cost. Co-authorship.

โœ…

Replication

Independently verify results. Published replication report.

๐Ÿ—ฃ๏ธ

Citizen Annotation

Label articles in your language. Acknowledgement in papers.

๐Ÿ“Š

Policy Brief

Analyze narratives for policy. Co-authorship.

๐Ÿ› ๏ธ

Infrastructure

Build dashboards, APIs, tools. Tool paper co-authorship.

๐Ÿ“–

Education

Create tutorials, course modules. Educational paper co-authorship.

Join the Lab

A contributor knowledge base spanning economics, linguistics, and Git โ€” with a clear roadmap for getting involved.

๐Ÿ“ˆ

Economic Foundations

Why narratives matter for economic measurement and policy.

  • Narrative Economics (Shiller, 2017) โ€” how stories spread like viruses and drive economic fluctuations. The theoretical basis for why we track narrative prevalence in news.
  • Text as Data (Gentzkow, Kelly & Taddy, 2019) โ€” converting unstructured text into quantitative measures. From bag-of-words to LLM-based embeddings.
  • Nowcasting โ€” using high-frequency non-traditional data (news, search trends) to predict official statistics before they're released.
  • Economic Complexity & Fingerprint โ€” each economy leaves a distinctive narrative fingerprint. The BENI approach captures that fingerprint through domain-specific narrative classification.
๐Ÿ—ฃ๏ธ

Linguistics & NLP

How we process low-resource languages at scale.

  • Low-Resource NLP โ€” languages with limited labeled data, tools, and pre-trained models. 7,000+ languages worldwide; fewer than 100 have any NLP support.
  • Multilingual Transformers โ€” mBERT, XLM-R, BanglaBERT, sahajBERT. Cross-lingual transfer learning enables progress where direct data is scarce.
  • Annotation Theory โ€” schema design, inter-annotator agreement, adjudication. Our LLM ensemble (Claude + GPT-4o) achieves human-level reliability at 5โ€“20ร— lower cost.
  • Script & Tokenization โ€” Bangla script (Bengali-Assamese), Sylheti Nagri, Devanagari, Arabic script variations. Each writing system presents unique tokenization challenges for transformer models.
โŒจ๏ธ

Git & GitHub Management

How we organize, version, and collaborate across pipelines.

  • Monorepo Structure โ€” all pipelines, datasets, and documentation in one repository. Every XENI pipeline follows the same directory template for discoverability.
  • Branch Strategy โ€” feature branches from main, CI/CD via GitHub Actions, auto-deploy to GitHub Pages. Pre-commit hooks enforce linting and formatting.
  • Issue & Project Boards โ€” each pipeline tracked as a GitHub Project. Labels: language/*, domain/*, paper/*, good first issue.
  • Contribution Model โ€” fork & PR workflow. All contributors credited in the registry. Academic co-authorship guaranteed for substantive contributions.
๐Ÿ” Looking For

Domain Experts

We're actively seeking collaborators with domain expertise. No NLP background needed โ€” we handle the technical pipeline. You bring the subject-matter knowledge.

๐Ÿ“ˆ Economists Validate narrative indices against macroeconomic indicators
๐ŸŒ Linguists Design annotation schemas for low-resource languages
๐Ÿฅ Health Specialists Build the BENI Health Index with domain-specific labels
๐ŸŒฟ Climate Researchers Define climate narrative categories for emerging economies
๐Ÿ“Š Social Scientists Design validation frameworks and policy briefs
Reach Out โ†’

Project Roadmap

LILA Lab builds toward 10 underserved languages by H1 2027. Here is the path.

Complete

BENI Pipeline & Baseline Papers

664K Bangla articles collected, annotated (Claude + GPT-4o ensemble), and classified. TF-IDF baseline: 91.7% accuracy. BENI Economic Index validated against CPI (r = โˆ’0.75, p < 0.001). Papers #1 and #2 complete, #3 active.

Active

Unified Corpus & 6-Model Benchmark

Unified 933K-article corpus built. TF-IDF on unified corpus: 94.77% accuracy. Six BanglaBERT models queued for Kaggle GPU training. Paper #3 (Building BENI) in progress.

Q3 2026

Multi-Domain Expansion

Extend BENI beyond Economics: Health (BENI Health Index), Climate (BENI Climate Index), Education (BENI Education Index). Each domain needs annotation schema design and validation data. Paper #4 (Nowcasting) and #5 (Text as Data survey).

Q4 2026

Sister Pipelines: AENI, NENI, SENI, CENI

Bootstraps for Assamese, Nepali, Sylheti, and Chittagonian pipelines. Each requires native-speaking contributors, dataset collection, annotation schema adaptation, and local validation. GitHub Project boards created for each.

H1 2027

African & Southeast Asian Expansion

HENI (Hausa, 80M speakers), KIENI (Swahili, 100M), VIENI (Vietnamese, 100M), TIENI (Filipino, 80M), IDENI (Indonesian, 200M). Target: 10-language XENI family complete. Paper #6 (LLMs as Measurement Devices).

First Steps

Start contributing today โ€” no prior NLP experience required.

1

Fork the Repository

Fork LilaLABx/LILA-LAB on GitHub, clone locally, and run pip install -e ".[all]". Read the Contributing Guide for environment setup.

2

Pick an Entry Point

Browse the good first issues or choose a language pipeline that matches your background. Linguists can contribute annotation schemas; developers can build infrastructure; economists can design validation frameworks.

3

Read the Knowledge Base

Study the three pillars above. The technical reports provide full methodological depth. The language registry tracks all pipeline statuses.

4

Join the Community

Introduce yourself on Discord. Tell us your language, your domain interest, and how you'd like to contribute. Every contributor โ€” technical or not โ€” is credited in our registry.

"
"

84% of NLP research is English-only. If your language isn't served, you're invisible in the data that shapes global decisions. We change that โ€” one pipeline at a time.

Connect with LILA Lab

Follow LILA Lab across platforms โ€” all coordinated from the repository. Join the movement for language infrastructure.

All channels are documented and coordinated from the Communications Center.