How Teleglobal Built a Production-Ready Multilingual GenAI Platform on AWS for GradeMaker

Executive Summary

GradeMaker is a digital exam platform used by assessment bodies worldwide, including AQA, one of the UK’s leading exam organisations. GradeMaker partnered with Teleglobal International to design and deploy a production-grade Generative AI platform on AWS, enabling automatic generation of high-quality multilingual exam content at scale.

The platform now:

Generates 10,000+ exam questions per month across 5 languages

Reduced manual question authoring effort by over 60%

Achieved ~98% content accuracy after model fine-tuning

Runs at significantly lower cost compared to third-party AI APIs

Operates entirely inside GradeMaker’s secure AWS environment

The Challenge

As AI became a standard expectation in the assessment industry, GradeMaker needed to modernise its exam content generation process. The existing system relied on manual authoring and translation workflows, which created several operational problems.

Scaling Content Creation

GradeMaker’s editorial teams were already at capacity. Writing exam questions manually across multiple subjects and languages was slow and resource-intensive. As new education boards onboarded, demand grew rapidly.

Multilingual Expansion

New contracts required exam content in Hindi, Tamil, Telugu, and Punjabi. Manual translation was expensive, slow, and introduced inconsistencies in exam language and marking schemes.

Limitations of Initial AI Proof of Concept

GradeMaker had experimented with external AI APIs, but this approach had key limitations:

High per-request API costs

Limited control over model behaviour

Data privacy concerns with exam content

No customisation for exam-specific formats

No production-grade infrastructure

This confirmed the need for a custom-trained model hosted securely within GradeMaker’s own AWS environment.

Competitive Pressure

The market was moving quickly. GradeMaker needed a robust, scalable AI solution to keep pace with competitors already adopting AI-native platforms.

The Solution

Teleglobal designed a production-ready Generative AI architecture on AWS covering the complete AI lifecycle: model selection, training, deployment, monitoring, and continuous improvement. The platform integrates directly with GradeMaker’s existing authoring tools via APIs, so educators can generate exam questions without any changes to their existing workflows.

Model Selection

Step 1: Evaluation Criteria

Before shortlisting any model, Teleglobal defined five non-negotiable criteria that the chosen model had to meet for GradeMaker’s use case:

Indic language coverage – must natively support Hindi, Tamil, Telugu, and Punjabi

Accuracy on exam content – must reach acceptable quality on structured educational text after fine-tuning

Cost model – must be open-source or freely licensable to avoid per-query API charges

Data privacy – must be self-hostable inside GradeMaker’s own AWS account

Extensibility – must support fine-tuning on custom domain-specific datasets

Step 2: Models Evaluated

Three open-source multilingual models were shortlisted and evaluated side by side:

	OpenHathi (Sarvam AI)	mBART-50 (Meta AI)	BLOOM-7B (BigScience)
Architecture	Mistral-7B, Indic-optimised	Seq-to-Seq, 611M params	Autoregressive, 7.1B params
Primary strength	Indic language generation	Multilingual translation	General multilingual text
Licence	Apache 2.0 (unrestricted)	MIT (open)	RAIL (restricted commercial use)
Self-hostable on AWS	Yes	Yes	Yes — very high memory cost
Fine-tunable	Yes	Partial	Yes

Step 3: Performance Parameters

The models were compared across five parameters that directly mapped to GradeMaker’s requirements:

Indic language coverage – native support for all four required languages

Fit for generative exam content – ability to produce structured question-and-answer output, not just translate

Fine-tuning capability – can the model be trained on domain-specific exam data

Licence – fully open for commercial self-hosted use

Infrastructure fit – deployable on SageMaker without excessive memory or compute overhead

Parameter	OpenHathi ✔ Selected	mBART-50 (Meta AI)	BLOOM-7B (BigScience)
Hindi	✔ Native	✔ Native	Partial
Tamil	✔ Native	Limited	✘ Weak
Telugu	✔ Native	Partial	✘ Weak
Punjabi	✔ Native	Partial	✘ Very limited
Generative exam content	✔ Strong	✘ Translation-focused	Partial
Fine-tunable on custom data	✔ Yes	Partial	✔ Yes
Commercial licence	✔ Apache 2.0	✔ MIT	✘ RAIL (restricted)
SageMaker deployment cost	✔ Efficient (7B)	✔ Lightweight (611M)	✘ High memory (7.1B)

OpenHathi was the only model to score positively across all five parameters. mBART-50, while lightweight and open-licensed, is built for translation rather than content generation and lacks reliable Telugu and Punjabi support. BLOOM-7B has a restricted commercial licence and high infrastructure cost, and its Indic language coverage outside Hindi is weak.

Step 4: Why OpenHathi Was Selected

OpenHathi was the only model to natively support all four required Indic languages. It achieved the highest baseline accuracy (60%) on GradeMaker’s benchmark – ahead of both alternatives – and reached 98% accuracy post fine-tuning, meeting the quality threshold required for live exam content. Key decision factors:

Language coverage – Only model with full native support for Hindi, Tamil, Telugu, and Punjabi

Model accuracy – Highest baseline accuracy on exam content before any training, making it the strongest starting point for fine-tuning

Cost efficiency – Apache 2.0 licence allows unrestricted commercial use — no per-question charges

Data privacy – Fully self-hosted inside GradeMaker’s AWS account — no exam data leaves their environment

Extensibility – Open architecture supports fine-tuning on domain-specific exam datasets with SageMaker Training Jobs

Custom Training Pipeline

To improve model performance for exam content generation, Teleglobal built a custom training pipeline on Amazon SageMaker Training Jobs.

Training Data

Training data included real exam content across all five languages:

Historical exam questions

Marking schemes and answer structures

Educational language patterns specific to assessment content

Infrastructure

Training workloads ran on Amazon SageMaker g5.12xlarge instances with NVIDIA A10G GPUs, providing the GPU acceleration needed for fine-tuning large language models.

Pipeline Design

Reusable pipelines — new subjects or languages can be added without rebuilding from scratch

Tamil language output validated against GradeMaker’s quality standards before going live

Accuracy improved from ~60% at baseline to ~98% after fine-tuning on domain-specific data

Production Inference Architecture

The trained model runs as a real-time API on Amazon SageMaker, integrating directly with GradeMaker’s question authoring platform and content review workflows.

Auto-scaling endpoints handle exam season traffic peaks and scale down during quiet periods

No changes required to GradeMaker’s existing client applications

99.9% uptime maintained since production launch

Full CloudWatch monitoring for endpoint health, latency, and billing visibility

Security and Compliance

Given the sensitivity of exam content, security was a core design requirement from the start.

All AI systems run inside a private Amazon VPC — model endpoints not publicly accessible

AWS IAM with least-privilege policies — only authorised systems can interact with AI resources

All data encrypted at rest (Amazon S3 + AWS KMS) and in transit (TLS)

Full audit log of every AI interaction for exam governance and compliance

AWS Services Used

Category	Service / Detail
Training	Amazon SageMaker Training Jobs — g5.12xlarge (NVIDIA A10G GPU)
Inference	Amazon SageMaker Real-Time Endpoints with auto-scaling
Model Source	OpenHathi multilingual LLM via Hugging Face
Storage	Amazon S3 for datasets, model artefacts, and logs
Networking	Amazon VPC with private subnets
Security	AWS IAM, AWS KMS
Monitoring	Amazon CloudWatch for metrics, logs, and billing alerts

Results

Metric	Result
Questions generated monthly	10,000+ across 5 languages
Question writing effort	Reduced by over 60%
Model accuracy	Improved from ~60% to ~98%
Manual writing workload	Reduced by ~50%
Production uptime	99.9%
Cost vs. third-party APIs	Significantly lower per question at volume
Languages supported	English, Hindi, Tamil, Telugu, Punjabi

“We had no AI when this started. What we have now is live, writing real exam content at scale in multiple languages. The cost is predictable, the platform is secure, and our team is focused on quality instead of drafting. That is a real shift for us.”

— GradeMaker

What’s Next

The platform is live and built to grow. Planned next steps:

Model benchmarking — running multiple LLMs in parallel to compare performance and improve output quality

AI tutoring — expanding to personalised learning paths using the same SageMaker infrastructure

Human-in-the-loop governance — integrating educator review workflows before content is published
Cost optimisation — evaluating AWS Inferentia2 chips to reduce inference costs at higher volumes