Your language. Your stories. Amplified by AI.
A collaborative platform building AI infrastructure for underserved languages.
Every language deserves to be visible in the data that shapes global decisions. We're building infrastructure for ten underserved languages โ and counting.
Proven in Bangla โ scaling to ten languages by 2027
Every language gets its own XENI โ [X]ploration & E Native-language Intelligence. The first letter changes per language. Proven in Bangla as BENI.
Aggregate millions of native-language articles from local sources, archives, and feeds โ preserving linguistic authenticity from day one.
LLM ensemble (Claude, GPT-4o) annotates narratives across domains. Multi-model classification achieves 91.7% accuracy on economic narratives.
Construct monthly narrative indices validated against macroeconomic indicators. BENI's Economic Index correlates with CPI at r=โ0.75 (p < 0.001).
The pattern teaches itself. First letter = language. XENI = pipeline. Domain = index.
Your language's initial + ENI = your pipeline โ
The XENI suffix always refers to the pipeline. The domain is a plain-English qualifier on the index it produces โ e.g., BENI Economic Index, BENI Health Index, BENI Climate Index. Same pipeline, many indices per language.
Each pipeline is named [Language initial] + ENI. Proven in Bangla, ready for your language.
The first complete XENI pipeline. From raw Bangla news to a validated macroeconomic narrative index โ proven against CPI, FX, and foreign reserves. Open-source, reproducible, and ready to adapt for your language.
The pipeline is language-agnostic. If you speak it, we can process it. Fork the repo, adapt the template, publish the paper.
Start Your XENI โA 6-paper research program on narrative measurement across underserved languages โ from statistical foundations to LLM-based measurement devices.
A quantitative framework for narrative-based economic analysis. Foundations of the methodology.
CompleteSystematic review, replication study, and Bangla extension (2007โ2025).
Submitted to arXivFrom raw news to validated narrative index โ the complete methodology and technical architecture.
Active (Jul 2026)Local-language news as a high-frequency economic indicator for inflation prediction.
Planned (Aug 2026)110-year survey of language-based methods: from content analysis to LLMs.
Planned (Oct 2026)Framework for narrative extraction and measurement in low-resource languages.
Proposed (Jan 2027)Every contribution model leads to academic authorship. If you speak an underserved language, you are not a data source โ you are a co-author.
Apply the pipeline to YOUR language. First-author paper.
Apply to health, climate, education. First-author paper.
Improve the classifier, reduce cost. Co-authorship.
Independently verify results. Published replication report.
Label articles in your language. Acknowledgement in papers.
Analyze narratives for policy. Co-authorship.
Build dashboards, APIs, tools. Tool paper co-authorship.
Create tutorials, course modules. Educational paper co-authorship.
A contributor knowledge base spanning economics, linguistics, and Git โ with a clear roadmap for getting involved.
Why narratives matter for economic measurement and policy.
How we process low-resource languages at scale.
How we organize, version, and collaborate across pipelines.
main, CI/CD via GitHub Actions, auto-deploy to GitHub Pages. Pre-commit hooks enforce linting and formatting.language/*, domain/*, paper/*, good first issue.We're actively seeking collaborators with domain expertise. No NLP background needed โ we handle the technical pipeline. You bring the subject-matter knowledge.
LILA Lab builds toward 10 underserved languages by H1 2027. Here is the path.
664K Bangla articles collected, annotated (Claude + GPT-4o ensemble), and classified. TF-IDF baseline: 91.7% accuracy. BENI Economic Index validated against CPI (r = โ0.75, p < 0.001). Papers #1 and #2 complete, #3 active.
Unified 933K-article corpus built. TF-IDF on unified corpus: 94.77% accuracy. Six BanglaBERT models queued for Kaggle GPU training. Paper #3 (Building BENI) in progress.
Extend BENI beyond Economics: Health (BENI Health Index), Climate (BENI Climate Index), Education (BENI Education Index). Each domain needs annotation schema design and validation data. Paper #4 (Nowcasting) and #5 (Text as Data survey).
Bootstraps for Assamese, Nepali, Sylheti, and Chittagonian pipelines. Each requires native-speaking contributors, dataset collection, annotation schema adaptation, and local validation. GitHub Project boards created for each.
HENI (Hausa, 80M speakers), KIENI (Swahili, 100M), VIENI (Vietnamese, 100M), TIENI (Filipino, 80M), IDENI (Indonesian, 200M). Target: 10-language XENI family complete. Paper #6 (LLMs as Measurement Devices).
Start contributing today โ no prior NLP experience required.
Fork LilaLABx/LILA-LAB on GitHub, clone locally, and run pip install -e ".[all]". Read the Contributing Guide for environment setup.
Browse the good first issues or choose a language pipeline that matches your background. Linguists can contribute annotation schemas; developers can build infrastructure; economists can design validation frameworks.
Study the three pillars above. The technical reports provide full methodological depth. The language registry tracks all pipeline statuses.
Introduce yourself on Discord. Tell us your language, your domain interest, and how you'd like to contribute. Every contributor โ technical or not โ is credited in our registry.
84% of NLP research is English-only. If your language isn't served, you're invisible in the data that shapes global decisions. We change that โ one pipeline at a time.
Follow LILA Lab across platforms โ all coordinated from the repository. Join the movement for language infrastructure.
All channels are documented and coordinated from the Communications Center.