How Teleglobal International Built a Secure GPU-Powered GenAI Platform on AWS for The Client

Executive Summary

The Client is a technology firm building a blockchain-based SaaS platform for tokenizing real-world assets. The platform requires a Generative AI infrastructure capable of processing enterprise documents, generating knowledge graphs, and performing intelligent content retrieval at scale.

Traditional external AI APIs introduced data residency risks, unpredictable costs, and infrastructure limitations. Teleglobal International designed and deployed a secure, GPU-powered GenAI platform entirely within The Client’s AWS environment, using Amazon SageMaker for model hosting and real-time inference.

Metric	Result
AI requests / month	12K+
Knowledge graph entities	500K+
Avg. response latency	~1.8s
Platform uptime	99.9%
External API dependency	Zero

Entire GenAI platform runs within The Client’s own AWS environment

GPU-accelerated inference on NVIDIA A10G via Amazon EC2 G5.2xlarge

External AI API dependency fully eliminated

About The Client

The Client is a technology and development firm building a blockchain-based SaaS platform for tokenizing real-world assets including commodities, property, and physical goods.

The platform integrates four core capabilities:

Blockchain infrastructure for decentralized asset tracking

Semantic knowledge graphs for organizing multimodal knowledge

GenAI-powered automation for complex document processing and reasoning

Hybrid Retrieval Augmented Generation (RAG) pipelines for intelligent knowledge retrieval

Core platform modules include the Knowledge Graph Studio, Tokenized Asset Registry, and Agent Runtime Engine. All components run on GPU-powered infrastructure for real-time GenAI inference.

Business Use Case

The Client required a Generative AI platform capable of supporting:

Multimodal document analysis across regulatory, compliance, and asset documentation

Knowledge graph generation from structured and unstructured enterprise data

Hybrid RAG-based knowledge retrieval for contextual insights

Intelligent automation for asset tokenization processes

The platform must support real-time inference and operate completely within The Client’s own cloud environment to meet security and compliance requirements.

The Challenge

Traditional AI solutions based on external APIs introduced several problems that made them unsuitable for The Client’s requirements.

Sensitive enterprise data would leave the organisation’s environment

Unpredictable operational costs due to usage-based pricing at scale

Lack of infrastructure control for enterprise-grade GenAI workloads

Limitations in scaling GPU-intensive AI inference workloads

The Client required a secure, scalable GenAI infrastructure running entirely within its own AWS environment.

Model Selection

Step 1: Evaluation Criteria

Before selecting a model, Teleglobal defined five criteria the chosen model had to meet for The Client’s GenAI platform:

Advanced reasoning capability: must interpret complex enterprise documents and generate accurate contextual insights

Multimodal processing: must handle both structured and unstructured data sources including documents, tables, and metadata

Real-time inference: must deliver low-latency responses suitable for application-level queries

Scalability: must support GPU-accelerated distributed workloads on Amazon SageMaker infrastructure

Security: must operate fully within The Client’s AWS account with no data leaving the environment

Step 2: Models Evaluated

Three model options were shortlisted and evaluated against The Client’s criteria:

GPT-5-class Multimodal LLM on Amazon SageMaker: state-of-the-art multimodal reasoning, self-hosted within AWS

Claude 3.5 Sonnet via Amazon Bedrock: managed foundation model, strong reasoning, private VPC endpoint

Llama 3.1 70B (open-source, self-hosted on SageMaker): open-source generative model, fully self-managed

Parameter	GPT-5-class LLM (SageMaker) ✔ Selected	Claude 3.5 Sonnet (Bedrock)	Llama 3.1 70B (Self-hosted)
Advanced reasoning	✔ State-of-the-art multimodal	✔ Strong reasoning	⚠ Good, limited at 70B scale
Multimodal processing	✔ Yes, text, docs, images, structured data	✔ Yes, text and documents	✘ Text only (70B variant)
Real-time inference on SageMaker	✔ Native SageMaker endpoint	⚠ Via Bedrock, not SageMaker	✔ Native SageMaker endpoint
GPU-accelerated scaling	✔ Full GPU control on EC2 G5	⚠ Managed, limited GPU control	✔ Full GPU control on EC2 G5
Data stays within AWS	✔ Yes, fully self-hosted on SageMaker	✔ Yes, via Bedrock VPC endpoint	✔ Yes, self-managed
Fine-tunable on custom data	✔ Yes, via SageMaker training jobs	⚠ Limited fine-tuning options	✔ Yes, open-source weights
Operational overhead	✔ Managed via SageMaker endpoints	✔ Fully managed by AWS Bedrock	✘ Team must manage model + infra
CloudWatch integration	✔ Native via SageMaker	⚠ Partial via Bedrock metrics	✔ Native via SageMaker

Step 3: Why GPT-5-class LLM on SageMaker Was Selected

The GPT-5-class multimodal LLM was the strongest match across all five criteria. The key decision points:

Claude 3.5 Sonnet: Claude 3.5 Sonnet on Bedrock is strong on reasoning and data sovereignty but runs as a managed Bedrock service, limiting direct GPU control, SageMaker-native endpoint management, and deep customisation for The Client’s GPU-intensive workloads.

Llama 3.1 70B: Llama 3.1 70B is viable on data sovereignty and fine-tuning but text-only at the 70B scale, lacks multimodal processing capability, and requires the team to fully self-manage model serving, scaling, and infrastructure.

GPT-5-class LLM: The GPT-5-class multimodal LLM delivered the strongest reasoning and document understanding, native multimodal support, full GPU control on EC2 G5 instances, and seamless SageMaker endpoint management — all within The Client’s AWS environment from day one.

GenAI Tasks Performed

The deployed platform supports the following GenAI capabilities:

Multimodal document analysis: regulatory and compliance documents, asset metadata

Named entity recognition from enterprise documents

Relationship mapping for knowledge graph generation

Hybrid Retrieval Augmented Generation (RAG) for contextual knowledge retrieval

GenAI-powered automation within asset tokenization workflows

Model fine-tuning and continuous learning on proprietary data

GenAI Processing Pipeline

The GenAI platform processes enterprise queries and documents across four layers:

Application Layer: user queries and documents submitted to application services

AI Processing Layer: requests routed to GenAI inference endpoints hosted on SageMaker

Model Inference Layer: the LLM processes the request and generates embeddings or contextual responses

Data and Knowledge Layer: hybrid RAG pipelines retrieve relevant knowledge graph data

Data and Knowledge Sources

The GenAI system processes and analyses multiple enterprise data sources:

Regulatory and compliance documents

Asset metadata and tokenization records

Enterprise datasets stored in Amazon S3

Knowledge graph relationships generated from structured data

GPU Infrastructure

GenAI inference workloads run on Amazon EC2 G5.2xlarge instances powered by NVIDIA A10G GPUs.

Specification	Detail
GPU	NVIDIA A10G Tensor Core GPU
GPU Memory	24 GB VRAM
CUDA Cores	6,144
vCPUs	8
System Memory	32 GB
Network Bandwidth	Up to 25 Gbps

GPU workload allocation:

Workload	GPU Utilization
Regulatory document analysis and Q&A	60%
Knowledge graph generation	25%
Model fine-tuning and learning	10%
Validation and QA	5%

Platform Architecture

The platform follows a GPU-accelerated GenAI pattern with four layers: Application Layer, AI Processing Layer, Model Inference Layer, and Data and Knowledge Layer.

GenAI Inference Layer

Amazon SageMaker: model hosting and real-time GenAI inference endpoints

Amazon EC2 G5.2xlarge: GPU compute for high-performance LLM inference

Hybrid RAG pipelines: contextual knowledge retrieval from enterprise data sources

Data and Storage Layer

Amazon S3: storage for datasets, model artefacts, training data, and enterprise content

Security and Governance

Amazon VPC: private subnets for secure network isolation of all GenAI workloads

AWS IAM: role-based access control with least-privilege policies

Encryption at rest and in transit across all data stores and inference endpoints

Restricted access to AI inference endpoints and enterprise data

Observability and Monitoring

Amazon CloudWatch: inference latency, endpoint health, GPU utilization, request throughput

Infrastructure performance monitoring and automated alerting

AWS Services Used

Category	Service / Detail
AI and Compute	Amazon SageMaker: model hosting and real-time inference endpoints
GPU Compute	Amazon EC2 G5.2xlarge: NVIDIA A10G GPU for GenAI workloads
Storage	Amazon S3: datasets, model artefacts, and training data
Networking	Amazon VPC: secure networking and private infrastructure
Security	AWS IAM: role-based access control and least-privilege policies
Monitoring	Amazon CloudWatch: infrastructure performance and inference metrics

Scalability and Reliability

The GenAI platform is designed to support enterprise-scale workloads:

GPU-based inference scaling for high-throughput document processing

Load distribution across SageMaker inference endpoints

High-availability infrastructure for continuous GenAI operations

Scalable microservices architecture supporting modular expansion

Cost Optimisation

The Client optimised infrastructure costs through:

GPU instance right-sizing on EC2 G5.2xlarge for actual workload requirements

Workload scheduling to maximise GPU utilisation efficiency

Self-hosted model eliminating all external AI API usage fees

Results

Metric	Result
AI inference requests processed	12,000+ per month
Average response latency	~1.8 seconds
Knowledge graph entities processed	500,000+
Platform availability	99.9% uptime
External AI API dependency	Eliminated
Data residency	All GenAI workloads within The Client’s AWS environment

What’s Next

The Client plans to continue enhancing its GenAI platform:

Advanced multimodal AI capabilities for richer document and data processing

Expanded knowledge graph analytics for deeper asset relationship mapping

Continuous model optimisation through fine-tuning on proprietary enterprise data

Enhanced monitoring and analytics for deeper GenAI inference insights

Deployment and Lifecycle Management

The GenAI platform follows modern DevOps and MLOps practices:

Automated model deployment and versioning using Amazon SageMaker

Infrastructure monitoring and alerting via Amazon CloudWatch

Continuous improvement through model fine-tuning on new enterprise data

Dataset and model artefact versioning for reproducibility