Gyan AI has recently unveiled Paramanu-Ganita – a mathematical language model of 208 million parameters.
Despite its relatively modest size—35 times smaller than bigger LLMs—it outshines its counterparts, including generalist models like LLama and Falcon and specialised models like Minerva, by significant margins in the GSM8k benchmark.
The model’s success highlights the efficiency of developing domain-specific models from scratch rather than adapting general LLMs to specific domains.
The research team consists of Mitodru Niyogi, founder and chief executive officer of Gyan AI, and Arnab Bhattacharya, computer science and engineering professor at IIT Kanpur, India, and AI advisor at Gyan AI. Niyogi is also associated with Abu Dhabi’s MBZUAI as an AI Researcher.
Training Method
The model was trained on a unique, high-quality mathematical corpus curated by the researchers, consisting of textbooks, lecture notes, and web-sourced materials. It was trained only for 146 hours of A100.
Paramanu-Ganita’s success can be attributed to its training regimen and its specialisation in mathematics. The model utilises an Auto-Regressive (AR) decoder that processes information sequentially, making it particularly adept at solving complex mathematical problems through logical reasoning. Its training was executed on various mathematical texts and source codes, ensuring a comprehensive understanding and application of mathematical logic and problem-solving.
The model’s performance was rigorously evaluated using perplexity metrics and benchmarks, confirming its effectiveness in handling complex mathematical problems efficiently.
The implications of such a specialised tool are vast. Paramanu-Ganita offers a reliable, efficient, and less resource-intensive alternative to larger, more generalised language models for industries and sectors relying heavily on mathematical calculations and modelling.
It also shows that smaller, domain-focused models can match or even exceed the performance of their larger counterparts without the need for massive computational power or financial investment.
Previously, the researchers had come up with Paramanu, a series of language models tailored for ten Indian languages, including Assamese, Bangla, Hindi, and others, using five different scripts. These models range from 13.29M to 367.5M parameters and were developed on a single GPU with a context size of 1024. The lineup features monolingual, bilingual, and multilingual configurations, the latter avoiding the “curse of multilinguality” by using typologically similar corpora.