cms.teleglobals.com

How Teleglobal International Built a Secure GPU-Powered GenAI Platform on AWS for The Client

How Teleglobal International Built a Secure GPU-Powered GenAI Platform on AWS for The Client

Executive Summary 

The Client is a technology firm building a blockchain-based SaaS platform for tokenizing real-world assets. The platform requires a Generative AI infrastructure capable of processing enterprise documents, generating knowledge graphs, and performing intelligent content retrieval at scale. 

Traditional external AI APIs introduced data residency risks, unpredictable costs, and infrastructure limitations. Teleglobal International designed and deployed a secure, GPU-powered GenAI platform entirely within The Client’s AWS environment, using Amazon SageMaker for model hosting and real-time inference. 

Metric Result 
AI requests / month 12K+ 
Knowledge graph entities 500K+ 
Avg. response latency ~1.8s 
Platform uptime 99.9% 
External API dependency Zero 
  • Entire GenAI platform runs within The Client’s own AWS environment 
  • GPU-accelerated inference on NVIDIA A10G via Amazon EC2 G5.2xlarge 
  • External AI API dependency fully eliminated 

About The Client 

The Client is a technology and development firm building a blockchain-based SaaS platform for tokenizing real-world assets including commodities, property, and physical goods. 

The platform integrates four core capabilities: 

  • Blockchain infrastructure for decentralized asset tracking 
  • Semantic knowledge graphs for organizing multimodal knowledge 
  • GenAI-powered automation for complex document processing and reasoning 
  • Hybrid Retrieval Augmented Generation (RAG) pipelines for intelligent knowledge retrieval 

Core platform modules include the Knowledge Graph Studio, Tokenized Asset Registry, and Agent Runtime Engine. All components run on GPU-powered infrastructure for real-time GenAI inference. 

Business Use Case 

The Client required a Generative AI platform capable of supporting: 

  • Multimodal document analysis across regulatory, compliance, and asset documentation 
  • Knowledge graph generation from structured and unstructured enterprise data 
  • Hybrid RAG-based knowledge retrieval for contextual insights 
  • Intelligent automation for asset tokenization processes 

The platform must support real-time inference and operate completely within The Client’s own cloud environment to meet security and compliance requirements. 

The Challenge 

Traditional AI solutions based on external APIs introduced several problems that made them unsuitable for The Client’s requirements. 

  • Sensitive enterprise data would leave the organisation’s environment 
  • Unpredictable operational costs due to usage-based pricing at scale 
  • Lack of infrastructure control for enterprise-grade GenAI workloads 
  • Limitations in scaling GPU-intensive AI inference workloads 

The Client required a secure, scalable GenAI infrastructure running entirely within its own AWS environment. 

Model Selection 

Step 1: Evaluation Criteria 

Before selecting a model, Teleglobal defined five criteria the chosen model had to meet for The Client’s GenAI platform: 

  • Advanced reasoning capability: must interpret complex enterprise documents and generate accurate contextual insights 
  • Multimodal processing: must handle both structured and unstructured data sources including documents, tables, and metadata 
  • Real-time inference: must deliver low-latency responses suitable for application-level queries 
  • Scalability: must support GPU-accelerated distributed workloads on Amazon SageMaker infrastructure 
  • Security: must operate fully within The Client’s AWS account with no data leaving the environment 

Step 2: Models Evaluated 

Three model options were shortlisted and evaluated against The Client’s criteria: 

  • GPT-5-class Multimodal LLM on Amazon SageMaker: state-of-the-art multimodal reasoning, self-hosted within AWS 
  • Claude 3.5 Sonnet via Amazon Bedrock: managed foundation model, strong reasoning, private VPC endpoint 
  • Llama 3.1 70B (open-source, self-hosted on SageMaker): open-source generative model, fully self-managed 
Parameter GPT-5-class LLM (SageMaker) ✔ Selected Claude 3.5 Sonnet (Bedrock) Llama 3.1 70B (Self-hosted) 
Advanced reasoning ✔ State-of-the-art multimodal ✔ Strong reasoning ⚠ Good, limited at 70B scale 
Multimodal processing ✔ Yes, text, docs, images, structured data ✔ Yes, text and documents ✘ Text only (70B variant) 
Real-time inference on SageMaker ✔ Native SageMaker endpoint ⚠ Via Bedrock, not SageMaker ✔ Native SageMaker endpoint 
GPU-accelerated scaling ✔ Full GPU control on EC2 G5 ⚠ Managed, limited GPU control ✔ Full GPU control on EC2 G5 
Data stays within AWS ✔ Yes, fully self-hosted on SageMaker ✔ Yes, via Bedrock VPC endpoint ✔ Yes, self-managed 
Fine-tunable on custom data ✔ Yes, via SageMaker training jobs ⚠ Limited fine-tuning options ✔ Yes, open-source weights 
Operational overhead ✔ Managed via SageMaker endpoints ✔ Fully managed by AWS Bedrock ✘ Team must manage model + infra 
CloudWatch integration ✔ Native via SageMaker ⚠ Partial via Bedrock metrics ✔ Native via SageMaker 

Step 3: Why GPT-5-class LLM on SageMaker Was Selected 

The GPT-5-class multimodal LLM was the strongest match across all five criteria. The key decision points: 

  • Claude 3.5 Sonnet: Claude 3.5 Sonnet on Bedrock is strong on reasoning and data sovereignty but runs as a managed Bedrock service, limiting direct GPU control, SageMaker-native endpoint management, and deep customisation for The Client’s GPU-intensive workloads. 
  • Llama 3.1 70B: Llama 3.1 70B is viable on data sovereignty and fine-tuning but text-only at the 70B scale, lacks multimodal processing capability, and requires the team to fully self-manage model serving, scaling, and infrastructure. 
  • GPT-5-class LLM: The GPT-5-class multimodal LLM delivered the strongest reasoning and document understanding, native multimodal support, full GPU control on EC2 G5 instances, and seamless SageMaker endpoint management — all within The Client’s AWS environment from day one. 

GenAI Tasks Performed 

The deployed platform supports the following GenAI capabilities: 

  • Multimodal document analysis: regulatory and compliance documents, asset metadata 
  • Named entity recognition from enterprise documents 
  • Relationship mapping for knowledge graph generation 
  • Hybrid Retrieval Augmented Generation (RAG) for contextual knowledge retrieval 
  • GenAI-powered automation within asset tokenization workflows 
  • Model fine-tuning and continuous learning on proprietary data 

GenAI Processing Pipeline 

The GenAI platform processes enterprise queries and documents across four layers: 

  • Application Layer: user queries and documents submitted to application services 
  • AI Processing Layer: requests routed to GenAI inference endpoints hosted on SageMaker 
  • Model Inference Layer: the LLM processes the request and generates embeddings or contextual responses 
  • Data and Knowledge Layer: hybrid RAG pipelines retrieve relevant knowledge graph data 

Data and Knowledge Sources 

The GenAI system processes and analyses multiple enterprise data sources: 

  • Regulatory and compliance documents 
  • Asset metadata and tokenization records 
  • Enterprise datasets stored in Amazon S3 
  • Knowledge graph relationships generated from structured data 

GPU Infrastructure 

GenAI inference workloads run on Amazon EC2 G5.2xlarge instances powered by NVIDIA A10G GPUs. 

Specification Detail 
GPU NVIDIA A10G Tensor Core GPU 
GPU Memory 24 GB VRAM 
CUDA Cores 6,144 
vCPUs 
System Memory 32 GB 
Network Bandwidth Up to 25 Gbps 

GPU workload allocation: 

Workload GPU Utilization 
Regulatory document analysis and Q&A 60% 
Knowledge graph generation 25% 
Model fine-tuning and learning 10% 
Validation and QA 5% 

Platform Architecture 

The platform follows a GPU-accelerated GenAI pattern with four layers: Application Layer, AI Processing Layer, Model Inference Layer, and Data and Knowledge Layer. 

GenAI Inference Layer 

  • Amazon SageMaker: model hosting and real-time GenAI inference endpoints 
  • Amazon EC2 G5.2xlarge: GPU compute for high-performance LLM inference 
  • Hybrid RAG pipelines: contextual knowledge retrieval from enterprise data sources 

Data and Storage Layer 

  • Amazon S3: storage for datasets, model artefacts, training data, and enterprise content 

    Security and Governance 

    • Amazon VPC: private subnets for secure network isolation of all GenAI workloads 
    • AWS IAM: role-based access control with least-privilege policies 
    • Encryption at rest and in transit across all data stores and inference endpoints 
    • Restricted access to AI inference endpoints and enterprise data 

    Observability and Monitoring 

    • Amazon CloudWatch: inference latency, endpoint health, GPU utilization, request throughput 
    • Infrastructure performance monitoring and automated alerting 

    AWS Services Used 

    Category Service / Detail 
    AI and Compute Amazon SageMaker: model hosting and real-time inference endpoints 
    GPU Compute Amazon EC2 G5.2xlarge: NVIDIA A10G GPU for GenAI workloads 
    Storage Amazon S3: datasets, model artefacts, and training data 
    Networking Amazon VPC: secure networking and private infrastructure 
    Security AWS IAM: role-based access control and least-privilege policies 
    Monitoring Amazon CloudWatch: infrastructure performance and inference metrics 

    Scalability and Reliability 

    The GenAI platform is designed to support enterprise-scale workloads: 

    • GPU-based inference scaling for high-throughput document processing 
    • Load distribution across SageMaker inference endpoints 
    • High-availability infrastructure for continuous GenAI operations 
    • Scalable microservices architecture supporting modular expansion 

    Cost Optimisation 

    The Client optimised infrastructure costs through: 

    • GPU instance right-sizing on EC2 G5.2xlarge for actual workload requirements 
    • Workload scheduling to maximise GPU utilisation efficiency 
    • Self-hosted model eliminating all external AI API usage fees 

    Results 

    Metric Result 
    AI inference requests processed 12,000+ per month 
    Average response latency ~1.8 seconds 
    Knowledge graph entities processed 500,000+ 
    Platform availability 99.9% uptime 
    External AI API dependency Eliminated 
    Data residency All GenAI workloads within The Client’s AWS environment 

    What’s Next 

    The Client plans to continue enhancing its GenAI platform: 

    • Advanced multimodal AI capabilities for richer document and data processing 
    • Expanded knowledge graph analytics for deeper asset relationship mapping 
    • Continuous model optimisation through fine-tuning on proprietary enterprise data 
    • Enhanced monitoring and analytics for deeper GenAI inference insights 

    Deployment and Lifecycle Management 

    The GenAI platform follows modern DevOps and MLOps practices: 

    • Automated model deployment and versioning using Amazon SageMaker 
    • Infrastructure monitoring and alerting via Amazon CloudWatch 
    • Continuous improvement through model fine-tuning on new enterprise data 

    Dataset and model artefact versioning for reproducibility