cms.teleglobals.com

How Teleglobal Built a Production-Ready Multilingual GenAI Platform on AWS for GradeMaker

How Teleglobal Built a Production-Ready Multilingual GenAI Platform on AWS for GradeMaker

 Executive Summary 

GradeMaker is a digital exam platform used by assessment bodies worldwide, including AQA, one of the UK’s leading exam organisations. GradeMaker partnered with Teleglobal International to design and deploy a production-grade Generative AI platform on AWS, enabling automatic generation of high-quality multilingual exam content at scale. 

The platform now: 

  • Generates 10,000+ exam questions per month across 5 languages 
  • Reduced manual question authoring effort by over 60% 
  • Achieved ~98% content accuracy after model fine-tuning 
  • Runs at significantly lower cost compared to third-party AI APIs 
  • Operates entirely inside GradeMaker’s secure AWS environment 

The Challenge 

As AI became a standard expectation in the assessment industry, GradeMaker needed to modernise its exam content generation process. The existing system relied on manual authoring and translation workflows, which created several operational problems. 

  1. Scaling Content Creation 

GradeMaker’s editorial teams were already at capacity. Writing exam questions manually across multiple subjects and languages was slow and resource-intensive. As new education boards onboarded, demand grew rapidly. 

  1. Multilingual Expansion 

New contracts required exam content in Hindi, Tamil, Telugu, and Punjabi. Manual translation was expensive, slow, and introduced inconsistencies in exam language and marking schemes. 

  1. Limitations of Initial AI Proof of Concept 

GradeMaker had experimented with external AI APIs, but this approach had key limitations: 

  • High per-request API costs 
  • Limited control over model behaviour 
  • Data privacy concerns with exam content 
  • No customisation for exam-specific formats 
  • No production-grade infrastructure 

This confirmed the need for a custom-trained model hosted securely within GradeMaker’s own AWS environment. 

  1. Competitive Pressure 

The market was moving quickly. GradeMaker needed a robust, scalable AI solution to keep pace with competitors already adopting AI-native platforms. 

The Solution 

Teleglobal designed a production-ready Generative AI architecture on AWS covering the complete AI lifecycle: model selection, training, deployment, monitoring, and continuous improvement. The platform integrates directly with GradeMaker’s existing authoring tools via APIs, so educators can generate exam questions without any changes to their existing workflows. 

Model Selection 

Step 1: Evaluation Criteria 

Before shortlisting any model, Teleglobal defined five non-negotiable criteria that the chosen model had to meet for GradeMaker’s use case: 

  • Indic language coverage – must natively support Hindi, Tamil, Telugu, and Punjabi 
  • Accuracy on exam content – must reach acceptable quality on structured educational text after fine-tuning  
  • Cost model – must be open-source or freely licensable to avoid per-query API charges 
  • Data privacy – must be self-hostable inside GradeMaker’s own AWS account 
  • Extensibility – must support fine-tuning on custom domain-specific datasets 

Step 2: Models Evaluated 

Three open-source multilingual models were shortlisted and evaluated side by side: 

  OpenHathi (Sarvam AI) mBART-50 (Meta AI) BLOOM-7B (BigScience) 
Architecture Mistral-7B, Indic-optimised Seq-to-Seq, 611M params Autoregressive, 7.1B params 
Primary strength Indic language generation Multilingual translation General multilingual text 
Licence Apache 2.0 (unrestricted) MIT (open) RAIL (restricted commercial use) 
Self-hostable on AWS Yes Yes Yes — very high memory cost 
Fine-tunable Yes Partial Yes 

Step 3: Performance Parameters 

The models were compared across five parameters that directly mapped to GradeMaker’s requirements: 

  • Indic language coverage – native support for all four required languages 
  • Fit for generative exam content – ability to produce structured question-and-answer output, not just translate 
  • Fine-tuning capability – can the model be trained on domain-specific exam data 
  • Licence – fully open for commercial self-hosted use 
  • Infrastructure fit – deployable on SageMaker without excessive memory or compute overhead 
Parameter OpenHathi ✔ Selected mBART-50 (Meta AI) BLOOM-7B (BigScience) 
Hindi ✔ Native ✔ Native Partial 
Tamil ✔ Native Limited ✘ Weak 
Telugu ✔ Native Partial ✘ Weak 
Punjabi ✔ Native Partial ✘ Very limited 
Generative exam content ✔ Strong ✘ Translation-focused Partial 
Fine-tunable on custom data ✔ Yes Partial ✔ Yes 
Commercial licence ✔ Apache 2.0 ✔ MIT ✘ RAIL (restricted) 
SageMaker deployment cost ✔ Efficient (7B) ✔ Lightweight (611M) ✘ High memory (7.1B) 

OpenHathi was the only model to score positively across all five parameters. mBART-50, while lightweight and open-licensed, is built for translation rather than content generation and lacks reliable Telugu and Punjabi support. BLOOM-7B has a restricted commercial licence and high infrastructure cost, and its Indic language coverage outside Hindi is weak. 

Step 4: Why OpenHathi Was Selected 

OpenHathi was the only model to natively support all four required Indic languages. It achieved the highest baseline accuracy (60%) on GradeMaker’s benchmark – ahead of both alternatives – and reached 98% accuracy post fine-tuning, meeting the quality threshold required for live exam content. Key decision factors: 

  • Language coverage – Only model with full native support for Hindi, Tamil, Telugu, and Punjabi 
  • Model accuracy – Highest baseline accuracy on exam content before any training, making it the strongest starting point for fine-tuning 
  • Cost efficiency – Apache 2.0 licence allows unrestricted commercial use — no per-question charges 
  • Data privacy – Fully self-hosted inside GradeMaker’s AWS account — no exam data leaves their environment 
  • Extensibility – Open architecture supports fine-tuning on domain-specific exam datasets with SageMaker Training Jobs 

Custom Training Pipeline 

To improve model performance for exam content generation, Teleglobal built a custom training pipeline on Amazon SageMaker Training Jobs. 

Training Data 

Training data included real exam content across all five languages: 

  • Historical exam questions 
  • Marking schemes and answer structures 
  • Educational language patterns specific to assessment content 

Infrastructure 

Training workloads ran on Amazon SageMaker g5.12xlarge instances with NVIDIA A10G GPUs, providing the GPU acceleration needed for fine-tuning large language models. 

Pipeline Design 

  • Reusable pipelines — new subjects or languages can be added without rebuilding from scratch 
  • Tamil language output validated against GradeMaker’s quality standards before going live 
  • Accuracy improved from ~60% at baseline to ~98% after fine-tuning on domain-specific data 

Production Inference Architecture 

The trained model runs as a real-time API on Amazon SageMaker, integrating directly with GradeMaker’s question authoring platform and content review workflows. 

  • Auto-scaling endpoints handle exam season traffic peaks and scale down during quiet periods 
  • No changes required to GradeMaker’s existing client applications 
  • 99.9% uptime maintained since production launch 
  • Full CloudWatch monitoring for endpoint health, latency, and billing visibility 

Security and Compliance 

Given the sensitivity of exam content, security was a core design requirement from the start. 

  • All AI systems run inside a private Amazon VPC — model endpoints not publicly accessible 
  • AWS IAM with least-privilege policies — only authorised systems can interact with AI resources 
  • All data encrypted at rest (Amazon S3 + AWS KMS) and in transit (TLS) 
  • Full audit log of every AI interaction for exam governance and compliance 

AWS Services Used 

Category Service / Detail 
Training Amazon SageMaker Training Jobs — g5.12xlarge (NVIDIA A10G GPU) 
Inference Amazon SageMaker Real-Time Endpoints with auto-scaling 
Model Source OpenHathi multilingual LLM via Hugging Face 
Storage Amazon S3 for datasets, model artefacts, and logs 
Networking Amazon VPC with private subnets 
Security AWS IAM, AWS KMS 
Monitoring Amazon CloudWatch for metrics, logs, and billing alerts 

Results 

Metric Result 
Questions generated monthly 10,000+ across 5 languages 
Question writing effort Reduced by over 60% 
Model accuracy Improved from ~60% to ~98% 
Manual writing workload Reduced by ~50% 
Production uptime 99.9% 
Cost vs. third-party APIs Significantly lower per question at volume 
Languages supported English, Hindi, Tamil, Telugu, Punjabi 

“We had no AI when this started. What we have now is live, writing real exam content at scale in multiple languages. The cost is predictable, the platform is secure, and our team is focused on quality instead of drafting. That is a real shift for us.” 

— GradeMaker 

What’s Next 

The platform is live and built to grow. Planned next steps: 

  • Model benchmarking — running multiple LLMs in parallel to compare performance and improve output quality 
  • AI tutoring — expanding to personalised learning paths using the same SageMaker infrastructure 
  • Human-in-the-loop governance — integrating educator review workflows before content is published 
  • Cost optimisation — evaluating AWS Inferentia2 chips to reduce inference costs at higher volumes