ParamBench: A Graduate-Level Benchmark for Evaluating LLM Understanding on Indic Subjects
Ayush Maheshwari, Kaushal Sharma, Vivek Patel, Aditya Maheshwari
ParamBench is a comprehensive graduate-level benchmark in Hindi designed to evaluate Large Language Models (LLMs) on their understanding of Indic subjects. The benchmark contains 17,275 multiple-choice questions across 21 subjects, covering a wide range of topics from Indian competitive examinations.
This benchmark is specifically designed to:
- Assess LLM performance on culturally and linguistically diverse content
- Evaluate understanding of India-specific knowledge domains
- Support the development of more culturally aware AI systems
We're excited to announce that the ParamBench dataset is now publicly available on the Hugging Face Hub!
You can find the dataset here: https://huggingface.co/datasets/bharatgenai/ParamBench
- 17,275 Questions: Extensive collection of graduate-level MCQs in Hindi
- 21 Subjects: Comprehensive coverage of diverse academic domains
- Standardized Format: Consistent question structure for reliable evaluation
- Automated Evaluation: Scripts for benchmarking and analysis
- Detailed Metrics: Subject-wise and question-type-wise performance analysis
Each question in the dataset includes:
unique_question_id: Unique identifier for each questionquestion_text: The question textoption_a,option_b,option_c,option_d: Four multiple choice optionscorrect_answer: The correct option (A, B, C, or D)subject: Subject categoryexam_name: Source examinationpaper_number: Paper/section identifierquestion_type: Type of question (MCQ, Blank-filling, assertion/reasoning, etc.)
The benchmark covers 21 subjects including but not limited to:
- Music
- History
- Drama and Theatre
- Economics
- Anthropology
- Current Affairs
- Indian Culture
- And more...
ParamBench/
├── data/
│ └── full-data.csv # Main dataset file
├── checkpoints/ # Model evaluation checkpoints
├── results/ # Analysis results and visualizations
├── benchmark_script.py # Main benchmarking script
├── analysis_models.py # Analysis and visualization script
├── requirements.txt # Python dependencies
└── README.md # This file
pip install -r requirements.txt- Python 3.8+
- PyTorch 2.0+
- Transformers 4.45+
- Pandas
- NumPy
- Plotly (for visualization)
- Clone the repository
git clone https://github.com/yourusername/ParamBench.git
cd ParamBench- Run the benchmark
python benchmark_script.pyThe benchmark script supports various configuration options:
# In benchmark_script.py
group_to_run = "small" # Options: "small", "medium", "large", or "all"
batch_size = 16 # Adjust based on GPU memoryAfter running benchmarks, generate comprehensive analysis reports:
python analysis_models.pyThis will generate:
- Model performance summary CSV
- Subject-wise accuracy charts
- Question type analysis
- Combined report with all metrics
@article{maheshwari2025parambench,
title={ParamBench: A Graduate-Level Benchmark for Evaluating LLM Understanding on Indic Subjects},
author={Maheshwari, Ayush and Sharma, Kaushal and Patel, Vivek and Maheshwari, Aditya},
journal={arXiv preprint arXiv:2508.16185},
year={2025}
}