
Executive Summary
GradeMaker is a digital exam platform used by assessment bodies worldwide, including AQA, one of the UK’s leading exam organisations. GradeMaker partnered with Teleglobal International to design and deploy a production-grade Generative AI platform on AWS, enabling automatic generation of high-quality multilingual exam content at scale.
The platform now:
- Generates 10,000+ exam questions per month across 5 languages
- Reduced manual question authoring effort by over 60%
- Achieved ~98% content accuracy after model fine-tuning
- Runs at significantly lower cost compared to third-party AI APIs
- Operates entirely inside GradeMaker’s secure AWS environment
The Challenge
As AI became a standard expectation in the assessment industry, GradeMaker needed to modernise its exam content generation process. The existing system relied on manual authoring and translation workflows, which created several operational problems.
- Scaling Content Creation
GradeMaker’s editorial teams were already at capacity. Writing exam questions manually across multiple subjects and languages was slow and resource-intensive. As new education boards onboarded, demand grew rapidly.
- Multilingual Expansion
New contracts required exam content in Hindi, Tamil, Telugu, and Punjabi. Manual translation was expensive, slow, and introduced inconsistencies in exam language and marking schemes.
- Limitations of Initial AI Proof of Concept
GradeMaker had experimented with external AI APIs, but this approach had key limitations:
- High per-request API costs
- Limited control over model behaviour
- Data privacy concerns with exam content
- No customisation for exam-specific formats
- No production-grade infrastructure
This confirmed the need for a custom-trained model hosted securely within GradeMaker’s own AWS environment.
- Competitive Pressure
The market was moving quickly. GradeMaker needed a robust, scalable AI solution to keep pace with competitors already adopting AI-native platforms.
The Solution
Teleglobal designed a production-ready Generative AI architecture on AWS covering the complete AI lifecycle: model selection, training, deployment, monitoring, and continuous improvement. The platform integrates directly with GradeMaker’s existing authoring tools via APIs, so educators can generate exam questions without any changes to their existing workflows.
Model Selection
Step 1: Evaluation Criteria
Before shortlisting any model, Teleglobal defined five non-negotiable criteria that the chosen model had to meet for GradeMaker’s use case:
- Indic language coverage – must natively support Hindi, Tamil, Telugu, and Punjabi
- Accuracy on exam content – must reach acceptable quality on structured educational text after fine-tuning
- Cost model – must be open-source or freely licensable to avoid per-query API charges
- Data privacy – must be self-hostable inside GradeMaker’s own AWS account
- Extensibility – must support fine-tuning on custom domain-specific datasets
Step 2: Models Evaluated
Three open-source multilingual models were shortlisted and evaluated side by side:
| OpenHathi (Sarvam AI) | mBART-50 (Meta AI) | BLOOM-7B (BigScience) | |
| Architecture | Mistral-7B, Indic-optimised | Seq-to-Seq, 611M params | Autoregressive, 7.1B params |
| Primary strength | Indic language generation | Multilingual translation | General multilingual text |
| Licence | Apache 2.0 (unrestricted) | MIT (open) | RAIL (restricted commercial use) |
| Self-hostable on AWS | Yes | Yes | Yes — very high memory cost |
| Fine-tunable | Yes | Partial | Yes |
Step 3: Performance Parameters
The models were compared across five parameters that directly mapped to GradeMaker’s requirements:
- Indic language coverage – native support for all four required languages
- Fit for generative exam content – ability to produce structured question-and-answer output, not just translate
- Fine-tuning capability – can the model be trained on domain-specific exam data
- Licence – fully open for commercial self-hosted use
- Infrastructure fit – deployable on SageMaker without excessive memory or compute overhead
| Parameter | OpenHathi ✔ Selected | mBART-50 (Meta AI) | BLOOM-7B (BigScience) |
| Hindi | ✔ Native | ✔ Native | Partial |
| Tamil | ✔ Native | Limited | ✘ Weak |
| Telugu | ✔ Native | Partial | ✘ Weak |
| Punjabi | ✔ Native | Partial | ✘ Very limited |
| Generative exam content | ✔ Strong | ✘ Translation-focused | Partial |
| Fine-tunable on custom data | ✔ Yes | Partial | ✔ Yes |
| Commercial licence | ✔ Apache 2.0 | ✔ MIT | ✘ RAIL (restricted) |
| SageMaker deployment cost | ✔ Efficient (7B) | ✔ Lightweight (611M) | ✘ High memory (7.1B) |
OpenHathi was the only model to score positively across all five parameters. mBART-50, while lightweight and open-licensed, is built for translation rather than content generation and lacks reliable Telugu and Punjabi support. BLOOM-7B has a restricted commercial licence and high infrastructure cost, and its Indic language coverage outside Hindi is weak.
Step 4: Why OpenHathi Was Selected
OpenHathi was the only model to natively support all four required Indic languages. It achieved the highest baseline accuracy (60%) on GradeMaker’s benchmark – ahead of both alternatives – and reached 98% accuracy post fine-tuning, meeting the quality threshold required for live exam content. Key decision factors:
- Language coverage – Only model with full native support for Hindi, Tamil, Telugu, and Punjabi
- Model accuracy – Highest baseline accuracy on exam content before any training, making it the strongest starting point for fine-tuning
- Cost efficiency – Apache 2.0 licence allows unrestricted commercial use — no per-question charges
- Data privacy – Fully self-hosted inside GradeMaker’s AWS account — no exam data leaves their environment
- Extensibility – Open architecture supports fine-tuning on domain-specific exam datasets with SageMaker Training Jobs
Custom Training Pipeline
To improve model performance for exam content generation, Teleglobal built a custom training pipeline on Amazon SageMaker Training Jobs.
Training Data
Training data included real exam content across all five languages:
- Historical exam questions
- Marking schemes and answer structures
- Educational language patterns specific to assessment content
Infrastructure
Training workloads ran on Amazon SageMaker g5.12xlarge instances with NVIDIA A10G GPUs, providing the GPU acceleration needed for fine-tuning large language models.
Pipeline Design
- Reusable pipelines — new subjects or languages can be added without rebuilding from scratch
- Tamil language output validated against GradeMaker’s quality standards before going live
- Accuracy improved from ~60% at baseline to ~98% after fine-tuning on domain-specific data
Production Inference Architecture
The trained model runs as a real-time API on Amazon SageMaker, integrating directly with GradeMaker’s question authoring platform and content review workflows.
- Auto-scaling endpoints handle exam season traffic peaks and scale down during quiet periods
- No changes required to GradeMaker’s existing client applications
- 99.9% uptime maintained since production launch
- Full CloudWatch monitoring for endpoint health, latency, and billing visibility
Security and Compliance
Given the sensitivity of exam content, security was a core design requirement from the start.
- All AI systems run inside a private Amazon VPC — model endpoints not publicly accessible
- AWS IAM with least-privilege policies — only authorised systems can interact with AI resources
- All data encrypted at rest (Amazon S3 + AWS KMS) and in transit (TLS)
- Full audit log of every AI interaction for exam governance and compliance
AWS Services Used
| Category | Service / Detail |
| Training | Amazon SageMaker Training Jobs — g5.12xlarge (NVIDIA A10G GPU) |
| Inference | Amazon SageMaker Real-Time Endpoints with auto-scaling |
| Model Source | OpenHathi multilingual LLM via Hugging Face |
| Storage | Amazon S3 for datasets, model artefacts, and logs |
| Networking | Amazon VPC with private subnets |
| Security | AWS IAM, AWS KMS |
| Monitoring | Amazon CloudWatch for metrics, logs, and billing alerts |
Results
| Metric | Result |
| Questions generated monthly | 10,000+ across 5 languages |
| Question writing effort | Reduced by over 60% |
| Model accuracy | Improved from ~60% to ~98% |
| Manual writing workload | Reduced by ~50% |
| Production uptime | 99.9% |
| Cost vs. third-party APIs | Significantly lower per question at volume |
| Languages supported | English, Hindi, Tamil, Telugu, Punjabi |
“We had no AI when this started. What we have now is live, writing real exam content at scale in multiple languages. The cost is predictable, the platform is secure, and our team is focused on quality instead of drafting. That is a real shift for us.”
— GradeMaker
What’s Next
The platform is live and built to grow. Planned next steps:
- Model benchmarking — running multiple LLMs in parallel to compare performance and improve output quality
- AI tutoring — expanding to personalised learning paths using the same SageMaker infrastructure
- Human-in-the-loop governance — integrating educator review workflows before content is published
- Cost optimisation — evaluating AWS Inferentia2 chips to reduce inference costs at higher volumes