Core Contributions
- Toolkit Development: Pioneered IndoLib to tackle linguistic challenges across 31 Indic languages, optimizing NER and summarization models.
- Benchmark Redefinition: Surpassed industry standards in language modeling.
- Text Normalization: Established a robust normalization and sampling pipeline.
- Sanskrit-English Translation: Achieved benchmark results in Sanskrit-English machine translation, under peer review for publication.
Technical Proficiency
- Large Language Models (LLM): Utilized large language models for multilingual analysis.
- Generative AI: Explored generative models for language translation.
- AI Ethics: Adhered to ethical guidelines in AI, ensuring respectful representation of linguistic diversity.
Academic Recognition
- Publication Under Review: Awaiting peer review for publication on Sanskrit-English translation achievements.
- Interdisciplinary Collaboration: Engaged with linguists and technologists to ensure accurate language representation.